Incremental Call

Incremental Call-Path Profiling
Andrew Bernat
[email protected]
Computer Sciences Department
University of Wisconsin-Madison
Madison, WI 53706
USA
© 2004 Andrew R. Bernat
April 14, 2004
Dynamic Call-Path Profiling
main
do_work
© 2004 Andrew R. Bernat
-2-
Incremental Call-Path Profiling
main
lookup
do_work
MPI_Recv
© 2004 Andrew R. Bernat
malloc
hash_lookup
strcpy
-3-
Incremental Call-Path Profiling
Point Profiler (Length 1)
main
do_work
© 2004 Andrew R. Bernat
-4-
96% CPU
Incremental Call-Path Profiling
Edge Profiler (Length 1)
main
40%
do_work
© 2004 Andrew R. Bernat
-5-
53%
96% CPU
Incremental Call-Path Profiling
Path Profiler (Length 3)
main
50%
36%
40%
do_work
© 2004 Andrew R. Bernat
-6-
53%
96% CPU
Incremental Call-Path Profiling
Full Call-Path Profiling
main
53%
43%
50%
36%
40%
do_work
© 2004 Andrew R. Bernat
-7-
53%
96% CPU
Incremental Call-Path Profiling
Call-Path Profiling Disassembled
Profiling functions is easy
Determining the call-path is hard
• Efficiency – cost per function invocation
• Safety – must not affect program’s behavior
• Correctness
© 2004 Andrew R. Bernat
-8-
Incremental Call-Path Profiling
Call-Path Profilers
Provide path-profile data for every function in
the program.
Two categories:
• Sample-based (gprof, CPPROF)
• Instrumenting profilers (PP, TAU, others)
© 2004 Andrew R. Bernat
-9-
Incremental Call-Path Profiling
Sampling Call-Path Profilers
Periodically pause the program
• Note active function
• Record call-path (current stack)
• Some profilers sample CPU usage
Advantages:
• Complete call-path information
Disadvantages:
• Imprecise (sampling-based)
• Limited metrics available
© 2004 Andrew R. Bernat
-10-
Incremental Call-Path Profiling
Instrumenting profilers
Track the current call-path
• Stack of active functions
• Maintain a pointer to the current call-path
Record metrics for all functions
• Counters, CPU usage, wall time
Disadvantages
• Incomplete (can miss recursion, dynamic calls)
• Expensive (instrumentation at entries, exits,
call sites)
• Only supports limited, inexpensive metrics
© 2004 Andrew R. Bernat
-11-
Incremental Call-Path Profiling
Incremental, Dynamic Call-Path Profiling
Incremental: Only profile functions of
interest to the user
• “Paradyn approach”
Dynamic: Allow “on-the-fly” profiling
• Global analysis unnecessary
Cost Effective: Reduce overall cost
Complete: User still gets complete call-path
information
© 2004 Andrew R. Bernat
-12-
Incremental Call-Path Profiling
Incremental, Dynamic Call-Path Profiling
Capture the call-path with a stack walk from
within the process.
• Includes dynamic calls and recursion
• Makes tracing function calls unnecessary
Walk the stack at function entries and exits.
Cost only incurred when profiled functions are
executed.
• Allows use of more expensive metrics
© 2004 Andrew R. Bernat
-13-
Incremental Call-Path Profiling
iPath, a Prototype
Incremental Call-Path Profiler
Allows use of arbitrary performance metrics.
• PMAPI (AIX), PAPI (Linux)
• Counters, timers, and arbitrary combinations
Profiles user-selected functions
Uses Dyninst
• Traces unmodified binaries
© 2004 Andrew R. Bernat
-14-
Incremental Call-Path Profiling
iPath Implementation
Instrumentation is contained in a run-time
library.
• User defines wanted metrics
Maintain a table for each function profiled
• Stack walk and associated performance data
for each detected call-path
Update the table at function entry and exit
Results available on the fly
© 2004 Andrew R. Bernat
-15-
Incremental Call-Path Profiling
iPath in Action
We applied iPath to two applications: the
Paradyn daemon and the MILC QCD simulation
framework.
Paradyn daemon: identified and fixed a serious
bottleneck in address -> function mapping.
MILC: identified and fixed a communication
bottleneck.
© 2004 Andrew R. Bernat
-16-
Incremental Call-Path Profiling
Paradyn Daemon
Top level: Performance Consultant was slow
Identified a bottleneck in address -> function
mapping.
• Parsing: target of a call-site
• Runtime: identifying functions on the stack
Call-path analysis showed the lookup function
performed horribly along only one path.
We optimized the function for that path.
Result: 98% decrease in instrumentation time!
© 2004 Andrew R. Bernat
-17-
Incremental Call-Path Profiling
MILC
Parallel computation framework for quantum
chromodynamics simulations.
We analyzed MPI performance using iPath and
focused on frequently executed paths.
We identified two bottlenecks, one of which
we fixed.
We reduced the number of times MPI
functions were called and replaced calls to
reduce synchronization time.
Result: 45% decrease in execution time
© 2004 Andrew R. Bernat
-18-
Incremental Call-Path Profiling
Summary
Call-path profiling is a useful technique, but
current methods are incomplete.
Increase flexibility and reduce cost by
profiling particular functions instead of the
whole program.
Come see the demo!
© 2004 Andrew R. Bernat
-19-
Incremental Call-Path Profiling
Questions?
© 2004 Andrew R. Bernat
-20-
Incremental Call-Path Profiling