The Impact of Performance Asymmetry in Multicore Architectures Saisanthosh Ravi Michael Konrad UW-Madison and Balakrishnan Rajwar Upton Lai , Intel Corp. 32nd Annual International Symposium on Computer Architecture Performance asymmetry ... difference in compute power of processors - Architectural differences - Micro-architectural parameters - Other F F S S - Heat: Thermal throttling Why need asymmetry now? - CMP/ Many cores as commodity systems Run variety of workloads Good serial performance and high throughput Optimal energy consumption Assume an asymmetric multicore system 2 Asymmetry & MT workloads N procs. Same config. Performance Performance N procs. Diff configs. Compute power Scalable? Same/Many Runs Stable? Need to utilize asymmetry. F F perform better SS SS SS Need predictable and robust performance 3 The problems Programmers Algorithm, Correctness, Thread Partitioning Don’t reason about asymmetry Characteristics of threads Partitioning, Synchronization barriers, Interference, Lifetime Scheduling of threads OS Kernel, Library, Application, DB/Web servers, Managed runtime systems (Java, .NET) 4 Contributions Asymmetry negatively affects applications - Studied many workloads on real hardware - Observed unpredictable workload behavior This can be fixed by - Evaluating threads’ work partitioning - Scheduling of threads with asymmetry 5 Outline Asymmetry and Performance Evaluation Methodology Asymmetric Configurations Workloads and Results 6 Evaluation methodology Asymmetry in real hardware - Intel 4-way 3-GHz Xeon - Different cores run at different frequencies - Software controlled Benefits - Long real-time runs (no simulations) - Workloads are setup according to specs - Representative of other forms of asymmetry - Communication - Micro-architecture etc. 7 Configurations F F F F all fast S S S S all slow Symmetric F F F S 1 slow F F F S S S S S 2 slow 3 slow Asymmetric F = Full frequency S = one-eighth of Full frequency (in talk and paper) S = one-fourth of Full frequency (in paper) 8 Perf. Metric Scalability Same or Many runs all fast 1 slow 2 slow 3 slow all slow Perf. Metric Studying impact Stability (Asymm) 9 Workloads evaluated SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 PMake Middle-tier business apps. Throughput parallel Webservers Throughput parallel Task-based parallelization Embarrassingly parallel 10 Impact of asymmetry Workloads Scalable Stable Fix SPECjbb SPECjAppServer P P O P P Apache Zeus P O O O P O TPC-H SPECOMP H.264 P O P O O P P P PMake P P 11 Workloads SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 Managed runtime system (BEA JRockit & Sun HotSpot) Windows 2003 and Linux 2 GCs- Parallel and Gen. Concurrent. Only Minor GC Upto 20 threads Minimal communication PMake 12 Stable? O 25 4 runs 19 17 Warehouses 15 13 9 19 11 Warehouses 17 0 15 0 13 5 11 5 9 10 7 10 5 15 3 15 7 20 5 20 3 25 P with kernel fix 30 1 30 35 Thousands 35 Scalable? Stability (JRockit/Gencon GC) on 2 slow 1 Transactions per second SPECjbb - Problem: Interference from runtime system (JVM, GC) - Fix: Kernel scheduler moves jobs from slow to fast if free 13 Workloads SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 Webserver on Linux Thread-based vs. Event-based model ApacheBench Raw perf. with static page Light and heavy loads PMake 14 Apache Scalable? P Stable? O 6 5 4 3 2 1 all fast 1 slow 2 slow 3 slow 0 all slow Speedup over all slow Scalability & Stability (light load) - Problem: light load - threads can be on fast/slow - No issues under heavy load - Fixes: Kernel scheduler or shorter lifetime of threads 15 Zeus Scalable? O Stable? O 8 6 4 2 all fast 1 slow 2 slow 3 slow 0 all slow Speedup over all slow Scalability & Stability - Under heavy and light loads: unpredictable - Superior perf. on symmetric configs. - Problem: Aggressive application-level scheduling 16 Workloads SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 OMP: Scientific app. Loop-based parallelization Intel Fortran,OpenMP on Linux H.264: Media encoding OpenMP on Windows 2003 PMake: Parallel Make of Linux Kernel PMake 17 SPECOMP Scalable? O Stable? O all fast 1 slow 2 slow 3 slow 0 1 0 all fast 1 2 1 slow 2 3 2 slow 3 with app. fix 3 slow 4 4 all slow Speedup over all slow 5 all slow Speedup over all slow Scalability - OpenMP schedules tasks assuming equal perf. procs. - Problem: Fast processors are held by slow - Fix: Change scheduling of tasks to on-demand - Downside: Overheads 18 Scalable? P Stable? P all fast 1 slow 2 slow 3 slow PMake 7 6 5 4 3 2 1 0 all slow Speedup over all slow all fast 1 slow 2 slow 3 slow H.264 7 6 5 4 3 2 1 0 all slow Speedup over all slow H.264 & PMake - H.264 slows down significantly with 1 slow proc. - Speeds up with 1 fast proc. - PMake linearly scalable on all configurations 19 Impact of asymmetry SPECjbb SPECjAppServer Apache Zeus TPC-H SPECOMP H.264 PMake Scalable Stable Fix Interference from runtime system. O P P Robust, multi-tier Migrate Migrate tasks tasks from from slow slow Superior perf. in P P Query application. parallelization to to fast fast core core if if one one is is Garbage collector system notsymmetric aware of asymm. free. free. dependent. Feedback tunes the P OpenMPserves based P O many Thread Unpredictable Approx. application on change Intra-query workload. Reconsider application parallelization with Inspect Or, Handle runtime few software, requests O O requests to reduce Concurrent GC causesO asymm. byand reducing with worsens heavy degree and of parallelization scheduling sync. barriers. Robust application. interference recycle between threads. overheads. more problems. light Parallelization. loads. stability. Very responsive tolow perf. Assign tasks on-demand threads High overhead, (GC). P O P interference, small Fast cores held by load. instead of up-front. Heavy utilization. Problems with light O P Independent Fix application application scheduler. heaps etc. O slow. P P scheduling Make OpenMP understand Threads well-balanced Threads can map to fast Multi-programming Consider asymm. in query and abundant. orasymm. slow proc. with several tasks. optimization P Pengine. 20 Conclusions Asymmetric systems - Good for energy and performance - But can introduce unpredictability Software to understand asymmetry - Evaluate application’s work partitioning - Scheduling of tasks. Mostly no other changes. - May be, feedback based Suitable asymmetry - Many slow & few fast processors 21 Questions?
© Copyright 2026 Paperzz