Incremental Call-Path Profiling Andrew Bernat [email protected] Computer Sciences Department University of Wisconsin-Madison Madison, WI 53706 USA © 2004 Andrew R. Bernat April 14, 2004 Dynamic Call-Path Profiling main do_work © 2004 Andrew R. Bernat -2- Incremental Call-Path Profiling main lookup do_work MPI_Recv © 2004 Andrew R. Bernat malloc hash_lookup strcpy -3- Incremental Call-Path Profiling Point Profiler (Length 1) main do_work © 2004 Andrew R. Bernat -4- 96% CPU Incremental Call-Path Profiling Edge Profiler (Length 1) main 40% do_work © 2004 Andrew R. Bernat -5- 53% 96% CPU Incremental Call-Path Profiling Path Profiler (Length 3) main 50% 36% 40% do_work © 2004 Andrew R. Bernat -6- 53% 96% CPU Incremental Call-Path Profiling Full Call-Path Profiling main 53% 43% 50% 36% 40% do_work © 2004 Andrew R. Bernat -7- 53% 96% CPU Incremental Call-Path Profiling Call-Path Profiling Disassembled Profiling functions is easy Determining the call-path is hard • Efficiency – cost per function invocation • Safety – must not affect program’s behavior • Correctness © 2004 Andrew R. Bernat -8- Incremental Call-Path Profiling Call-Path Profilers Provide path-profile data for every function in the program. Two categories: • Sample-based (gprof, CPPROF) • Instrumenting profilers (PP, TAU, others) © 2004 Andrew R. Bernat -9- Incremental Call-Path Profiling Sampling Call-Path Profilers Periodically pause the program • Note active function • Record call-path (current stack) • Some profilers sample CPU usage Advantages: • Complete call-path information Disadvantages: • Imprecise (sampling-based) • Limited metrics available © 2004 Andrew R. Bernat -10- Incremental Call-Path Profiling Instrumenting profilers Track the current call-path • Stack of active functions • Maintain a pointer to the current call-path Record metrics for all functions • Counters, CPU usage, wall time Disadvantages • Incomplete (can miss recursion, dynamic calls) • Expensive (instrumentation at entries, exits, call sites) • Only supports limited, inexpensive metrics © 2004 Andrew R. Bernat -11- Incremental Call-Path Profiling Incremental, Dynamic Call-Path Profiling Incremental: Only profile functions of interest to the user • “Paradyn approach” Dynamic: Allow “on-the-fly” profiling • Global analysis unnecessary Cost Effective: Reduce overall cost Complete: User still gets complete call-path information © 2004 Andrew R. Bernat -12- Incremental Call-Path Profiling Incremental, Dynamic Call-Path Profiling Capture the call-path with a stack walk from within the process. • Includes dynamic calls and recursion • Makes tracing function calls unnecessary Walk the stack at function entries and exits. Cost only incurred when profiled functions are executed. • Allows use of more expensive metrics © 2004 Andrew R. Bernat -13- Incremental Call-Path Profiling iPath, a Prototype Incremental Call-Path Profiler Allows use of arbitrary performance metrics. • PMAPI (AIX), PAPI (Linux) • Counters, timers, and arbitrary combinations Profiles user-selected functions Uses Dyninst • Traces unmodified binaries © 2004 Andrew R. Bernat -14- Incremental Call-Path Profiling iPath Implementation Instrumentation is contained in a run-time library. • User defines wanted metrics Maintain a table for each function profiled • Stack walk and associated performance data for each detected call-path Update the table at function entry and exit Results available on the fly © 2004 Andrew R. Bernat -15- Incremental Call-Path Profiling iPath in Action We applied iPath to two applications: the Paradyn daemon and the MILC QCD simulation framework. Paradyn daemon: identified and fixed a serious bottleneck in address -> function mapping. MILC: identified and fixed a communication bottleneck. © 2004 Andrew R. Bernat -16- Incremental Call-Path Profiling Paradyn Daemon Top level: Performance Consultant was slow Identified a bottleneck in address -> function mapping. • Parsing: target of a call-site • Runtime: identifying functions on the stack Call-path analysis showed the lookup function performed horribly along only one path. We optimized the function for that path. Result: 98% decrease in instrumentation time! © 2004 Andrew R. Bernat -17- Incremental Call-Path Profiling MILC Parallel computation framework for quantum chromodynamics simulations. We analyzed MPI performance using iPath and focused on frequently executed paths. We identified two bottlenecks, one of which we fixed. We reduced the number of times MPI functions were called and replaced calls to reduce synchronization time. Result: 45% decrease in execution time © 2004 Andrew R. Bernat -18- Incremental Call-Path Profiling Summary Call-path profiling is a useful technique, but current methods are incomplete. Increase flexibility and reduce cost by profiling particular functions instead of the whole program. Come see the demo! © 2004 Andrew R. Bernat -19- Incremental Call-Path Profiling Questions? © 2004 Andrew R. Bernat -20- Incremental Call-Path Profiling
© Copyright 2026 Paperzz