HPCPI and Xtools Version 0.6.6 User`s Guide

HPCPI and Xtools Version 0.6.6 User's
Guide
HP Part Number: 5992-4009
Published: March 2008
Edition: 1
© Copyright 2008 Hewlett-Packard Development Company, L.P.
Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial
Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under
vendor's standard commercial license. The information contained herein is subject to change without notice. The only warranties for HP products
and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as
constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. UNIX is a registered
trademark of The Open Group.
Trademark Acknowledgements
AMD Opteron is a trademark of Advanced Micro Devices, Inc.
Intel Itanium is a trademark of Intel Corporation in the U.S. and other countries.
Intel Xeon is a trademark of Intel Corporation in the U.S. and other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries.
Hewlett-Packard is independent of Sun Microsystems.
Table of Contents
About This Document.......................................................................................................13
Intended Audience................................................................................................................................13
Document Organization.......................................................................................................................13
Typographic Conventions.....................................................................................................................13
Related Information..............................................................................................................................14
Publishing History................................................................................................................................14
HP Encourages Your Comments..........................................................................................................15
1 Introduction...................................................................................................................17
HPCPI...................................................................................................................................................17
HPCPI Components........................................................................................................................17
HPCPI Sampling Characteristics..........................................................................................................18
Xtools....................................................................................................................................................20
xclus and xcxclus.......................................................................................................................20
xperf and xcxperf.......................................................................................................................21
2 Installing HPCPI and Xtools.........................................................................................25
Installation Requirements.....................................................................................................................25
Patch Requirements.........................................................................................................................25
Product Dependencies.....................................................................................................................25
Hardware Requirements.................................................................................................................25
Firmware Requirements..................................................................................................................25
Operating System Requirements.....................................................................................................25
Software Requirements ...................................................................................................................25
Memory Requirements ...................................................................................................................25
I/O Requirements.............................................................................................................................25
Disk Space Requirements................................................................................................................25
RPM Packages.......................................................................................................................................25
Installing the Software..........................................................................................................................26
Before Installing the Software..........................................................................................................26
Installing HPCPI or Xtools on Systems with Existing Versions......................................................27
Installing the Software on Standalone Systems...............................................................................27
Installing the Software on HP XC Clusters.....................................................................................27
Using the Full Imaging Installation Procedure..........................................................................28
Using the Imaging Installation Procedure with Manual Propagation......................................28
Using cexec to Run RPM on the Clients..................................................................................29
Verifying the Installation .....................................................................................................................30
Removing the Software.........................................................................................................................30
3 Getting Started with HPCPI.........................................................................................31
Simple HPCPI Session..........................................................................................................................31
Step 1: Loading the HPCPI Environment.............................................................................................31
Step 2: Setting the HPCPI Database Environment Variable (HPCPIDB)..............................................31
Step 3: Creating the HPCPI Database Directory...................................................................................32
Step 4: Starting the HPCPI Daemon.....................................................................................................32
Step 5: Running the Code You Want to Analyze..................................................................................32
Step 6: Flushing the HPCPI Data to Disk.............................................................................................32
Step 7: Viewing Per Image Statistics for the System.............................................................................32
Table of Contents
3
Step 8: Viewing Per Procedure Statistics for the Application...............................................................33
Step 9: Viewing Per Instruction Statistics.............................................................................................33
Step 10: Stopping the HPCPI Daemon..................................................................................................33
4 Using HPCPI..................................................................................................................35
Starting HPCPI......................................................................................................................................35
Setting Up the HPCPI Environment................................................................................................35
Selecting a Location for the HPCPI Database Directory.................................................................36
Setting the Default Database Directory Environment Variable (HPCPID)......................................36
Starting the hpcpid Daemon..........................................................................................................36
Startup Information for hpcpid ...............................................................................................36
Selecting Events to Monitor.............................................................................................................37
Commonly Used Event Sets.......................................................................................................38
Modifying the Event Interval Value...........................................................................................38
Event Duty Qualifier..................................................................................................................39
Running an Application for Analysis...................................................................................................40
Labeling Data...................................................................................................................................40
Controlling the Daemon with hpcpictl............................................................................................41
Flushing Data to Disk: hpcpictl flush.....................................................................................41
Stopping the Daemon: hpcpictl quit........................................................................................41
Starting a New Data Epoch: hpcpictl epoch.............................................................................41
Displaying HPCPI Status Information: hpcpictl show..............................................................41
Viewing Data with hpcpiprof, hpcpilist, and hpcpitopcounts..............................................43
Default Input Data...........................................................................................................................43
Flushing Data: hpcpictl flush..................................................................................................43
Viewing Per-Image Data: hpcpiprof.................................................................................................44
HPCPI Header.................................................................................................................................44
hpcpiprof Image Data Table........................................................................................................44
hpcpiprof Output with Multiple Events................................................................................45
Viewing Per-Procedure Data: hpcpiprof image_name..................................................................46
HPCPI Procedure Header...............................................................................................................46
hpcpiprof Procedure Data Table..................................................................................................46
Viewing Per-Instruction Data: hpcpilist procedure_name image_name................................47
hpcpilist Header.........................................................................................................................47
hpcpilist Data Table...................................................................................................................47
Interpreting hpcpilist Event Counts..........................................................................................48
Listing the Instructions with the Highest Event Counts: hpcpitopcounts......................................49
hpcpitopcounts Header..............................................................................................................49
hpcpitopcounts Data Table........................................................................................................49
Listing Instructions in an Image: hpcpitopcounts image_name..................................................50
Interpreting hpcpitopcounts Event Counts...............................................................................50
HPCPI Utility Options..........................................................................................................................51
Specifying an Alternate Database....................................................................................................51
Example......................................................................................................................................51
Specifying an Alternate Epoch........................................................................................................51
Example......................................................................................................................................51
Selecting Data by System.................................................................................................................51
Example......................................................................................................................................52
Specifying Events to Display...........................................................................................................52
Examples:...................................................................................................................................52
Selecting Data by Label...................................................................................................................52
Extracting Data for a Process from Shared Image Metrics........................................................52
Specifying an Alternate Sort Key....................................................................................................53
Example......................................................................................................................................53
4
Table of Contents
Displaying Raw Values....................................................................................................................53
Limiting the hpcpiprof Output....................................................................................................54
Additional Options..........................................................................................................................54
Tips and Best Practices for Using HPCPI.............................................................................................55
Tips..................................................................................................................................................55
Using Event Sets.........................................................................................................................55
Stopping the Daemon After You Finish Collecting Data (hpcpictl quit)...........................55
Limiting the Event Count Display (hpcpiprof -keep Option)............................................56
Using Database Directories, Epochs, or Labels to Organize Your Data....................................56
Database Directories.............................................................................................................56
Epochs...................................................................................................................................56
Labels....................................................................................................................................56
Event Intervals.................................................................................................................................56
Multiple Duty Groups.....................................................................................................................56
Itanium Instruction Metrics.............................................................................................................57
Measuring Memory Controller and HyperTransport Events.........................................................57
HyperTransport Transmit and Receive Events..........................................................................57
5 Using HPCPI Labels......................................................................................................59
Overview...............................................................................................................................................59
Simple HPCPI Session Using Labels....................................................................................................60
Step 1: Setting Up the Environment and Starting the Daemon.......................................................60
Step 2: Establishing the Label and Running the Application..........................................................60
Step 3: Flushing the Data.................................................................................................................60
Step 4: Using the Label with hpcpiprof.......................................................................................60
Step 5: Stopping the HPCPI Daemon..............................................................................................61
Label Selectors.......................................................................................................................................62
Selector Operators...........................................................................................................................62
-not Operator............................................................................................................................63
-and Operator............................................................................................................................63
-or Operator..............................................................................................................................63
-equiv Operator.......................................................................................................................63
Operator Syntax.........................................................................................................................63
Multiple Labels.....................................................................................................................................64
Reusing Labels......................................................................................................................................64
Comparing Epochs and Labels.............................................................................................................64
Using Epochs with Labels...............................................................................................................64
Label Examples.....................................................................................................................................65
Existing Processes: -pid pid........................................................................................................65
Using Labels with Application Arguments.....................................................................................65
Using the Application Argument in the Label Name................................................................65
Utilities that Spawn Processes: -pgid this.................................................................................65
Spawned Processes without the Originator: -pgid this -pid this -not......................65
All Processes: -pid -1 -not........................................................................................................65
Kernel Idle Data: -pid 0................................................................................................................66
Kernel Idle Data Per CPU...........................................................................................................66
Creating Labels in Programs................................................................................................................67
C Code Example..............................................................................................................................67
Notes...........................................................................................................................................68
Fortran Code Example.....................................................................................................................68
6 Using HPCPI on an HP XC Cluster.............................................................................69
Overview...............................................................................................................................................69
Table of Contents
5
Using Labels with mpirun and Other Distribution Utilities..........................................................69
Collecting Data on Multiple Nodes......................................................................................................70
Consolidating and Synchronizing Data..........................................................................................70
Selecting Output Data for Specific Systems....................................................................................70
Example Using HP-LSF, SLURM, and MPI.....................................................................................70
Creating the Common HPCPI Directory and Epoch.................................................................70
Submitting the Job......................................................................................................................71
prolog File..................................................................................................................................71
epilog File...................................................................................................................................72
Collecting Data on One Node...............................................................................................................73
Starting a Distribution Utility from hpcpictl label.................................................................73
7 Using Xtools..................................................................................................................75
Xtools Overview....................................................................................................................................76
Using xclus and xcxclus..................................................................................................................76
Starting xclus and xcxclus...............................................................................................................78
Step 1: Setting Up the Xtools Environment.....................................................................................78
Step 2: Setting the DISPLAY Environment Variable........................................................................78
Step 3: Starting xclus or xcxclus................................................................................................78
Starting xclus...........................................................................................................................78
Specifying Nodes with xclus..............................................................................................78
Starting xcxclus.......................................................................................................................78
Specifying Nodes with xcxclus.........................................................................................79
Specifying Nodes for xclus or xcxclus.................................................................................79
Specifying Nodes with -nodes............................................................................................79
Creating a Cluster File..........................................................................................................79
Viewing xclus and xcxclus Displays...............................................................................................81
Viewing xclus (Enhanced) Itanium Icons.....................................................................................82
Viewing xclus (Enhanced) Single-Core and Dual-Core AMD Opteron Node Icons...................83
Viewing xclus(Enhanced) Native Quad-Core AMD Opteron Node Icons..................................84
Viewing xcxclus (Generic) Node Icons........................................................................................85
Showing Statistic Names and Descriptions.....................................................................................86
Showing Bandwidth or Utilization Rates........................................................................................86
Showing HyperTransport Data Statistics or Data and Control Statistics........................................86
Changing the Refresh Rate..............................................................................................................86
Hiding Statistic Values.....................................................................................................................86
Suspending the Display...................................................................................................................86
Modifying the Display Size and Layout..........................................................................................86
Using Enhanced (xclus) Menu Options........................................................................................87
Using Generic (xcxclus) Menu Options.......................................................................................87
Recording, Replaying, and Plotting xclus and xcxclus Data..........................................................89
Recording Data................................................................................................................................89
Suppressing the Display (-quiet)............................................................................................89
Replaying Data................................................................................................................................89
Plotting Data....................................................................................................................................89
Starting xperf or xcxperf from xclus or xcxclus........................................................................92
Viewing Grouped Nodes......................................................................................................................92
Viewing Individual Node Icons......................................................................................................92
Controlling Group Displays............................................................................................................92
Modifying the Parameters that Define a Group........................................................................92
Using xperf and xcxperf..................................................................................................................94
Starting xperf and xcxperf...............................................................................................................94
Viewing xperf and xcxperf Displays...............................................................................................95
Viewing Itanium xperf (Enhanced) Statistics...............................................................................96
6
Table of Contents
CPU............................................................................................................................................96
Instructions.................................................................................................................................96
FPC.............................................................................................................................................96
Cycles.........................................................................................................................................96
Cache..........................................................................................................................................96
SysBus.........................................................................................................................................97
Branch.........................................................................................................................................97
Sum I/O B/W...............................................................................................................................97
DMA B/W...................................................................................................................................97
Viewing AMD Opteron xperf (Enhanced) Statistics.....................................................................98
CPU............................................................................................................................................99
IPC..............................................................................................................................................99
FPC.............................................................................................................................................99
Cycles.........................................................................................................................................99
Execution....................................................................................................................................99
Dcache........................................................................................................................................99
Icache..........................................................................................................................................99
Branch.......................................................................................................................................100
DRAM.......................................................................................................................................100
Memory (Third Generation AMD Opteron Only)...................................................................100
CpuloRequests (AMD Opteron Only)......................................................................................100
HTn (HyperTransport Links)...................................................................................................100
Viewing xcxperf (Generic) Statistics...........................................................................................101
CPU...........................................................................................................................................101
Disk...........................................................................................................................................102
NFS...........................................................................................................................................102
Lustre........................................................................................................................................102
Infiniband.................................................................................................................................102
Ethernet....................................................................................................................................102
Interrupts..................................................................................................................................102
ContextSwitch..........................................................................................................................102
Sockets......................................................................................................................................102
Elan...........................................................................................................................................103
Memory....................................................................................................................................103
Swap.........................................................................................................................................103
VMalloc.....................................................................................................................................103
Displaying Color Legends and Creating Tear-Away Legends......................................................104
Hiding or Showing Graphs...........................................................................................................104
Showing I/O Bandwidth or Utilization Rates................................................................................104
Showing Cycles Per Instruction or Instructions Per Cycle............................................................104
Modifying Graph Colors and Line Widths...................................................................................104
Using xperf (Enhanced) Menu Options......................................................................................104
Using xcxperf (Generic) Menu Options.....................................................................................105
Starting an HPCPI Label from xperf................................................................................................106
Recording, Replaying, and Plotting xperf and xcxperf Data........................................................107
Displaying System Information with xperf or xcxperf.................................................................108
Viewing Generic Data with xclus or xperf.....................................................................................109
Viewing Enhanced Data with xcxclus or xcxperf........................................................................110
Xtools Daemons..................................................................................................................................111
A Product Specifications...............................................................................................113
HPCPI Database Directories and Files................................................................................................113
Examples........................................................................................................................................113
hpcpicat Output..............................................................................................................................114
Table of Contents
7
HPCPI Product Limitations................................................................................................................115
Skid................................................................................................................................................115
Attribution Issues..........................................................................................................................115
Inline Routines..........................................................................................................................115
Multi-Issue Architectures.........................................................................................................116
Calls to exec().................................................................................................................................116
Unknown Locations.......................................................................................................................116
Mandated Duty Groups.................................................................................................................116
Active Fraction Changes................................................................................................................116
B HPCPI Quick Reference.............................................................................................117
Starting HPCPI....................................................................................................................................117
Stopping and Controlling HPCPI.......................................................................................................117
Viewing HPCPI Data..........................................................................................................................118
Controlling Input and Output for HPCPI Utilities.............................................................................118
C Xtools Quick Reference.............................................................................................119
xclus and xcxclus Tasks.................................................................................................................119
Starting xclus or xcxclus..........................................................................................................119
Modifying xclus and xcxclus Displays....................................................................................119
Recording, Replaying, and Plotting xclus or xcxclus Data.....................................................120
xperf and xcxperf Tasks.................................................................................................................121
Starting xperf or xcxperf................................................................................................................121
Modifying xperf and xcxperf Displays....................................................................................121
Additional xperf and xcxperf Tasks.........................................................................................122
Glossary.........................................................................................................................123
Index...............................................................................................................................125
8
Table of Contents
List of Figures
1-1
1-2
7-1
7-2
7-3
7-4
7-5
7-6
7-7
7-8
7-9
7-10
7-11
7-12
A-1
xclus Display for AMD Opteron Systems..................................................................................21
xperf Display for an Itanium System..........................................................................................22
xclus Display for Itanium Systems.............................................................................................81
Itanium xclus Display.................................................................................................................82
Four Single-Core AMD Opteron xclus Display..........................................................................83
Native Quad-Core AMD Opteron xclus Display.......................................................................84
Generic Node xcxclus Display...................................................................................................85
CPU Description Window.............................................................................................................86
Plotted Data from xclus..............................................................................................................91
xclus Group Icon.........................................................................................................................92
xperf Display for an AMD Opteron System...............................................................................98
xcxperf Display........................................................................................................................101
Displaying the CPU Color Legend..............................................................................................104
System Information Display........................................................................................................108
HPCPI Database..........................................................................................................................114
9
10
List of Tables
1-1
1-2
1-3
1-4
4-1
7-1
7-2
7-3
7-4
B-1
B-2
B-3
B-4
C-1
C-2
C-3
C-4
C-5
C-6
Processors that Support Enhanced Statistics.................................................................................20
Processors that Support Generic Statistics....................................................................................20
Statistics for xclus and xcxclus......................................................................................................21
Statistics for xperf and xcxperf .....................................................................................................23
Commonly Used Event Sets..........................................................................................................38
xclus (Enhanced) Menu Options................................................................................................87
xcxclus Generic) Menu Options.................................................................................................88
xperf (Enhanced) Menu Options...............................................................................................104
xcxperf (Generic) Menu Options..............................................................................................105
Starting HPCPI............................................................................................................................117
Stopping and Controlling HPCPI................................................................................................117
Viewing HPCPI Data...................................................................................................................118
Controlling Input and Output for HPCPI Utilities.....................................................................118
Starting xclus or xcxclus........................................................................................................119
Modifying xclus or xcxclus Displays....................................................................................119
Recording, Replaying, and Plotting xclus or xcxclus Data...................................................120
Starting xperf or xcxperf........................................................................................................121
Modifying xclus or xcxclus Displays....................................................................................121
Additional xperf and xcxperf Tasks.......................................................................................122
11
12
About This Document
This document describes how to install and use the HPCPI and Xtools performance analysis
tools on Linux systems running on HP Integrity Servers.
Intended Audience
This document is intended for programmers with Linux experience and knowledge of Intel®
Itanium® or AMD Opteron™ processor architecture.
Document Organization
This document is organized as follows:
Chapter 1: “Introduction”
This chapter provides an overview of the product
components.
Chapter 2: “Installing HPCPI and
This chapter describes how to install the product
Xtools”
components.
Chapter 3: “Getting Started with
This chapter shows a simple HPCPI user session with the
HPCPI”
basic HPCPI utilities and features.
Chapter 4: “Using HPCPI”
This chapter describes how to start HPCPI and collect and
analyze HPCPI data.
Chapter 5: “Using HPCPI Labels”
This chapter describes how to use HPCPI labels, which
enable you to select HPCPI data according to process
characteristics.
Chapter 6: “Using HPCPI on an HP This chapter describes additional tasks and features to use
XC Cluster”
with HPCPI in an HP XC cluster.
Chapter 7: “Using Xtools”
This chapter describes how to use Xtools (the GUI utilities
xclus, xcxclus, xperf, and xcxperf).
Appendix A: “Product
This appendix contains file information and other reference
Specifications”
information.
Appendix B: “HPCPI Quick
This appendix contains quick reference information for
Reference”
HPCPI.
Appendix C: “Xtools Quick
This appendix contains quick reference information for
Reference”
Xtools.
Typographic Conventions
This document uses the following typographical conventions:
%, $, or #
audit(5)
Command
Computer output
Ctrl+x
ENVIRONMENT VARIABLE
[ERROR NAME]
A percent sign represents the C shell system prompt. A dollar
sign represents the system prompt for the Bourne, Korn, and
POSIX shells. A number sign represents the superuser prompt.
A manpage. The manpage name is audit, and it is located in
Section 5.
A command name or qualified command phrase.
Text displayed by the computer.
A key sequence. A sequence such as Ctrl+x indicates that you
must hold down the key labeled Ctrl while you press another
key or mouse button.
The name of an environment variable; for example, PATH.
The name of an error, usually returned in the errno variable.
Intended Audience
13
Key
Term
User input
Variable
[]
{}
...
|
WARNING
CAUTION
IMPORTANT
NOTE
The name of a keyboard key. Return and Enter both refer to the
same key.
The defined use of an important word or phrase.
Commands and other text that you type.
The name of a placeholder in a command, function, or other
syntax display that you replace with an actual value.
The contents are optional in syntax. If the contents are a list
separated by |, you can choose one of the items.
The contents are required in syntax. If the contents are a list
separated by |, you must choose one of the items.
The preceding element can be repeated an arbitrary number of
times.
Indicates the continuation of a code example.
Separates items in a list of choices.
A warning calls attention to important information that if not
understood or followed results in personal injury or
nonrecoverable system problems.
A caution calls attention to important information that if not
understood or followed results in data loss, data corruption, or
damage to hardware or software.
An important provides essential information to explain a concept
or to complete a task.
A note contains additional information to emphasize or
supplement important points of the main text.
Related Information
The following documents contain information about Intel Itanium architecture and performance
events:
•
Intel® Itanium® 2 Processor Reference Manual for Software Development and Optimization.
Document Number 251110.
•
Dual-Core Update to the Intel® Itanium® Processor Reference Manual for Software Development
and Optimization. Document Number 308065.
•
Intel® Itanium® 2 Processor Specification Update. Document Number 251141.
The following documents contain information about AMD Opteron architecture and performance
events:
• BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors. Publication Number
31116.
• BIOS and Kernel Developer's Guide for AMD NPT Family 0Fh Processors. Publication Number
32559.
The following documents contain information about HP XC software:
• HP XC System Software: Administration Guide Version 3.2.1
• HP XC System Software User's Guide
Publishing History
The document printing date and part number indicate the document’s current edition. The
printing date changes when a new edition is printed. Minor changes can be made at reprint
without changing the printing date. The document part number changes when extensive changes
are made. Document updates can be issued between editions to correct errors or document
14
product changes. To ensure that you receive the updated or new editions, subscribe to the
appropriate product support service. See your HP sales representative for details.
Manufacturing Part
Number
Supported Operating
Systems
Supported Versions
Edition Number
Publication Date
5992–4009
2-6 based versions of
Red Hat Linux
Version 0.6.6
1
March 2008
HP Encourages Your Comments
HP encourages your comments concerning this document. We are committed to providing
documentation that meets your needs. Send any errors found, suggestions for improvement, or
compliments to:
[email protected]
Include the document title, manufacturing part number, and any comment, error found, or
suggestion for improvement you have concerning this document.
HP Encourages Your Comments
15
16
1 Introduction
The HP Continuous Profiling Infrastructure (HPCPI) and Xtools are performance analysis tools
for Linux systems running on HP Integrity Servers. HPCPI enables you to analyze the performance
and execution of programs and to identify ways to improve runtime performance. You can also
use HPCPI to analyze CPU events for a system.
The Xtools are performance visualization tools that enable you to monitor the performance and
resource utilization of a group of systems or nodes in an HP XC cluster, and to monitor the
performance of individual systems.
HPCPI
HPCPI provides low-overhead continuous profiling of images (executables, shared libraries, the
kernel, and loadable modules). HPCPI is a statistical sampling profiler that provides data based
on periodic sampling of hardware performance counters from the Performance Monitoring Unit
(PMU) of the microprocessor. The sample intervals are event based; at every nth occurrence of
an event (such as a CPU cycle), HPCPI records the location of the instruction pointer.
HPCPI enables you to do the following tasks:
• Analyze applications without recompiling them.
You do not need to use any special link options or libraries when compiling programs that
you want to analyze.
•
Display performance data at multiple levels of granularity.
HPCPI utilities can display performance data for the following:
— A system. HPCPI partitions the data by image (binary file).
— An image. HPCPI partitions the data by procedure.
— A procedure in an image. HPCPI partitions the data by line of source code and assembly
instruction.
Viewing performance data partitioned per image for an application or system enables you
to identify which images are having the most effect on performance.
Viewing performance data partitioned per procedure or per line of instruction enables you
to identify areas of the application that are executed frequently, or areas where execution
is delayed because data or resources are not available.
•
Use labels to isolate performance data for processes, process groups, users, or CPUs.
You can associate labels with performance data according to characteristics such as process
ID (PID), parent PID, process group ID, user ID, and CPU number. A label can also isolate
performance data for code executed in a shared image (such as a library) that was called
from a specific process. You can also assign labels to specific sections of code (you must
insert instructions and recompile programs to do this; “Creating Labels in Programs”
(page 67) includes C and Fortran code examples).
•
Reanalyze and reformat HPCPI reports without retaking performance measurements. HPCPI
automatically saves performance data in databases, which enables you to run the HPCPI
analysis tools multiple times with the same data.
HPCPI Components
The HPCPI profiling system consists of the following components:
• A kernel driver that retrieves data from the PMU.
• The hpcpid daemon that retrieves data from the driver and writes it to the disk.
• The hpcpictl utility that controls the daemon.
• The following utilities that display and analyze HPCPI data:
HPCPI
17
—
hpcpiprof
The hpcpiprof utility displays performance profiles for systems (per image) or images
(per procedure). The following excerpt from hpcpiprof output shows the number of
CPU cycles used per image on system:
CPU_CYCLES
---------283629e6
3824e6
2117e6
:
:
%
----96.9%
1.3%
0.7%
cum%
-----96.9%
98.2%
98.9%
image
---------------------------vmlinux-2.6.9-34.7hp.XCsmp
libm-2.3.4.so
sum
The following excerpt from hpcpiprof output shows CPU utilization statistics for
procedures in the image myApp:
CPU_CYCLES
---------191201e7
1309e7
—
%
----99.3%
0.7%
cum%
-----99.3%
100.0%
procedure
----------routine1
unknown_rou
image
----myApp
myApp
hpcpilist
The hpcpilist utility lists per-line performance statistics for a procedure. The following
is an excerpt from an hpcpilist output:
CPU_CYCLES
---------:
:
2333e6
0
0
2716e6
:
:
—
PC
B
--------------- -
routine1+0x0030
routine1+0x0031
routine1+0x0032
routine1+0x0040
:
ASM
--------------------------------
ldfs
nop.f
addl
ldfs
f7=[r34]
0
r14=152,gp;;
f6=[r14];;
hpcpitopcounts
The hpcpitopcounts utility lists the instructions with the most counts for performance
events.
—
hpcpicat
The hpcpicat utility displays the contents of a performance data file with minimal
formatting. This utility is primarily a debugging tool for advanced users who want to
create applications that parse and format performance data; it is not intended for general
performance profiling.
HPCPI Sampling Characteristics
HPCPI is a statistical sampling profiler that uses the PMU. When n events (such as CPU cycles
or cache misses) occur, the PMU triggers a performance monitoring interrupt. The interrupt
handler records the instruction pointer of the interrupted code and creates a sample based on
the instruction pointer and the triggering event. The interrupt handler then reprograms the PMU
for another sampling interrupt and returns. The following sections describe HPCPI sampling
characteristics:
Inherent Limitations of Statistical Sampling
Performance profilers based on statistical sampling
have some inherent limitations because each sample represents many events, but not all of the
events occur at the recorded instruction pointer. However, many samples accumulate for
frequently-executed code regions, and the probability of taking a sample at a location is
proportional to the number of actual event occurrences at that location. Over time, the statistical
histogram of samples approaches the corresponding proportion of actual event occurrences; the
18
Introduction
greater the number of samples, the closer the statistical correspondence. Therefore, the statistical
event samples provide a reasonably accurate profile of actual event distributions in a program.
Comparison with End-to-End Event Counts
Some profilers monitor the total number of events
that occur during a time interval, such as the duration of a program. These end-to-end event
counts are usually accurate, even for short programs, because they are direct measurements and
not statistical. Such measurements are also much less intrusive because they do not require
periodic sampling interrupts. However, unlike sampled event counts, end-to-end event counts
do not provide any information about event distribution, or where the events occurred in the
program. You can use sampled event counts to deduce end-to-end event counts by summing
the corresponding event counts. However, you cannot deduce the distribution of events from
end-to-end counts; you must use a sampling profiler to determine event distribution.
Sampling Multiple Events and Calculating Metrics
An advantage of HPCPI is that it can sample
more than one event at a time. This enables you to use counts for multiple event types to calculate
metrics that characterize software performance. For example, you can determine the instructions
per cycle (IPC) metric for a procedure by dividing the number of retired instructions for the
procedure by the number of CPU cycles for the procedure. A high IPC is usually desirable; a low
IPC typically indicates inefficiencies and areas where performance can be improved. Analyzing
metrics such as IPC is often more useful than analyzing single event counts such as time-based
CPU cycles to determine where and how code can be optimized.
Sampling More Events than the Number of Hardware PMU Counters
A significant advantage
of HPCPI is that it can sample more events than the number of hardware PMU counters. Users
can configure HPCPI to monitor more events than the number of hardware event counters
available for the processor PMU. HPCPI places the events in duty groups and multiplexes (cycles
through) the duty groups so that only the events in one duty group are monitored at any time,
and adjusts event counts for multiplexing. Users can run an application once while monitoring
a large number of events and use HPCPI options to select the event data displayed.
HPCPI Sampling Characteristics
19
Xtools
The Xtools utilities are X11 clients with GUIs that enable you to monitor the performance of
multiple systems and individual systems. The Xtools bundle consists of the following utilities:
• xclus
• xcxclus
• xperf
• xcxperf
xclus and xcxclus
The xclus and xcxclus utilities enable you to monitor performance and resource utilization
for multiple systems or nodes in a cluster. By default, xclus displays processor-specific statistics,
or enhanced statistics, for the processors listed in Table 1-1, and generic statistics for all other
processor types.
Table 1-1 Processors that Support Enhanced Statistics
Intel Itanium model numbers 900 or higher (previously referred to as Intel Itanium 2 processors)
Second-Generation AMD Opteron (single-core and dual-core AMD Opteron)
Third-Generation AMD Opteron (native quad-core AMD Opteron, sometimes referred to as Barcelona)
The xcxclus utility is the HP XC variant of xclus. It uses components of HP XC software and
may not function properly on systems without HP XC software. By default, xcxclus monitors
only the nodes in your current job allocation and displays processor-specific statistics, or generic
statistics, for the processors listed in Table 1-2.
Table 1-2 Processors that Support Generic Statistics
Intel Itanium model numbers 900 or higher (previously referred to as Intel Itanium 2 processors)
Second-Generation AMD Opteron (single-core and dual-core AMD Opteron)
Third-Generation AMD Opteron (native quad-core AMD Opteron, sometimes referred to as Barcelona)
Other x86-64 processors, such as Intel Xeon™ processors
Figure 1-1 shows an xclus display for nine systems, each with four single-core Opteron
processors.Table 1-3 (page 21) lists the statistics that xclus and xcxclus display.
20
Introduction
Figure 1-1 xclus Display for AMD Opteron Systems
Table 1-3 Statistics for xclus and xcxclus
xclus Statistics (Enhanced)
xcxclus Statistics (Generic)
For Itanium and Opteron processors:
• CPU utilization
•
•
•
•
For Itanium processors only:
• Front-side bus (FSB) activity
• Memory interface data (MID) bus activity
• I/O bus activity
Processor activity
Ethernet activity
Physical memory utilization
Interconnect I/O (Gigabit
Ethernet, Infiniband, and Elan Quadrics QsNetII)
• Disk I/O (for nodes with attached disks)
For Opteron processors only:
• DRAM activity
• HyperTransport activity
xperf and xcxperf
The xperf and xcxperf utilities enable you to monitor performance and resource utilization
for individual systems. By default, xperf displays enhanced statistics for the processor types
listed in Table 1-1 (page 20), and generic statistics for all other processor types.
The xcxperf utility is the HP XC variant of xperf. It uses components of HP XC software and
may not function properly on systems without HP XC software. By default, xcxperf displays
generic statistics for the processor types listed in Table 1-2 (page 20).
Figure 1-2 shows an xperf display for an Itanium system. Table 1-4 lists a summary of the
statistics that xcxperf and xcxperf display, and Chapter 7 (page 75) contains detailed lists of
the statistics that xperf and xcxperf report.
Xtools
21
Figure 1-2 xperf Display for an Itanium System
22
Introduction
Table 1-4 Statistics for xperf and xcxperf
xperf Statistics (Enhanced)
xcxperf Statistics (Generic)
For Itanium and Opteron processors:
• CPU utilization
• Instructions per cycle
• Floating point operations retired per cycle (FPC)
•
•
•
•
•
•
•
•
•
•
•
•
•
For Itanium processors only:
• Per cycle statistics for numerous execution and stall
events
• Cache miss events
• System bus utilization
• I/O bus activity
• Dynamic Memory Allocation (DMA) bus activity
For Opteron processors only:
• Per cycle statistics for numerous execution and stall
events
• Instruction and data cache events
• Execution dispatch events
• Branch metrics
• DRAM events
• Memory events
• HyperTransport link utilization
CPU utilization (for the user and the system)
Disk activity
NFS activity
Lustre activity
Elan (Quadrics QsNetII interconnect) activity
Infiniband activity
Ethernet activity
Interrupts per second
Context Switches per second
Socket statistics
Physical memory utilization
Swap utilization
Virtual memory utilization
Xtools
23
24
2 Installing HPCPI and Xtools
This chapter describes the installation requirements and procedures for HPCPI and Xtools. This
chapter addresses the following topics:
• “Installation Requirements” (page 25)
• “RPM Packages” (page 25)
• “Installing the Software” (page 26)
• “Verifying the Installation ” (page 30)
• “Removing the Software” (page 30)
Installation Requirements
This section contains installation requirements.
Patch Requirements
See the HPCPI and Xtools Release Notes for any patch requirements.
Product Dependencies
The xcxclus and xcxperf utilities (the xtools-xc_clients RPM package) require the HP
XC software.
Hardware Requirements
The HPCPI and Xtools bundle is supported on HP Proliant servers and HP XC cluster platforms
with the processors listed in Table 1-2 (page 20). The xclus and xperf utilities display enhanced
statistics for the processors listed in Table 1-1 (page 20). On systems without the processor types
listed, the xclus and xperf utilities display generic statistics.
Firmware Requirements
None.
Operating System Requirements
The HPCPI and Xtools bundle is supported on 2.6–based versions of Red Hat Linux.
Software Requirements
The xcxclus and xcxperf utilities require HP XC software.
Memory Requirements
A minimum of 256 MB per processor core.
I/O Requirements
None.
Disk Space Requirements
40 MB (20 MB for the HPCPI package and 20 MB for the Xtools packages).
RPM Packages
The HPCPI and Xtools bundle contains the following RPM packages:
Installation Requirements
25
•
hpcpi
This package contains all the files necessary to use HPCPI.
•
xtools-common
This package contains files and utilities that are common to xclus and xperf, and to
xcxclus and xcxperf (the HP XC variants of xclus and xperf). You must install this
package if you are installing the xtools-clients or xtools-xc_clients package.
•
xtools-clients
This package contains xclus and xperf and associated files. You must also install the
xtools-common package to use xclus and xperf.
•
xtools-xc_clients
This package contains xcxclus and xcxperf and associated files. You must also install
the xtools-common package to use xcxclus and xcxperf.
The package file names have the following format:
package_name-version-os.platform.rpm
Where:
package_name
version
os
platform
Identifies the package (hpcpi, xtools-common, xtools-clients, or
xtools-xc_clients).
Identifies the bundle version, such as 0.6.6.
Identifies the operating system and version, such as rh4.
Specifies the platform:
• ia64 denotes the Itanium platform (such as CP6000)
• x86_64 denotes the AMD Opteron and other x86 64-bit processors,
including Intel Xeon processors (such as CP3000 and CP4000)
For example, a bundle can contain the following files:
hpcpi-0.6.6-rh4.ia64.rpm
xtools—common–0.6.6-rh4.ia64.rpm
xtools-clients–0.6.6-rh4.ia64.rpm
xtools-xc_clients–0.6.6-rh4.ia64.rpm
Installing the Software
This section describes how to install the software.
Before Installing the Software
Before installing the software, verify that no one is using HPCPI or Xtools by verifying that no
instances of the following programs are running:
hpcpicat
hpcpictl
hpcpid
hpcpilist
hpcpiprof
hpcpitopcounts
xclus
xperf
xcxclus
xcxperf
apmond
clusmond
26
Installing HPCPI and Xtools
HP also recommends that you install the HPCPI or Xtools software when the system is idle to
minimize the effects of the installation procedure on other computing tasks. You can use the
SLURM scontrol command with the State=drain parameter to enable existing jobs to
complete on a node and prevent new jobs from starting.
Installing HPCPI or Xtools on Systems with Existing Versions
If you are installing HPCPI or Xtools on a system that already has a version of the software
installed, do not update the version; that is, do not use the rpm -U option. Instead, you must
use the rpm -e command to remove the previous version before installing the new version.
Installing the Software on Standalone Systems
To install the HPCPI and Xtools packages on a standalone system, follow these steps:
1.
2.
Log in as superuser.
If previous versions of the packages are already installed on the system, use rpm -e to
remove them:
% rpm -e package_file package_file...
IMPORTANT:
3.
Do not use the rpm -U command to update previous versions of the packages.
Use rpm to install the packages as follows:
% rpm -ivh package_file package_file...
For example:
% rpm -ivh hpcpi-0.6.6-rh4.ia64.rpm \
xtools-common-0.6.6-rh4.ia64.rpm \
xtools-clients-0.6.6-rh4.ia64.rpm
xtools-xc_clients-0.6.6-rh4.ia64.rpm
Installing the Software on HP XC Clusters
You can use standard HP XC Image Replication and Distribution Environment procedures to
install the HPCPI and Xtools packages on HP XC nodes. This section describes three methods
to install the HPCPI and Xtools software on HP XC cluster nodes; the first two methods are
standard procedures described in HP XC System Software: Administration Guide. The methods are
as follows:
•
Use the full imaging installation procedure to create a golden image and propagate that
image. This installation procedure is recommended by the HP XC documentation. The
disadvantages of using this procedure are as follows:
— You cannot install software on the clients until you have created the golden image on
the head node.
— You must reimage and reboot the client nodes.
•
Use the HP XC Image Replication and Distribution Environment to create a golden image,
but use cexec to manually run si_updateclient and other commands on the client
nodes. This installation procedure is described in the HP XC documentation. The advantage
of this method is that you do not need to reimage and reboot the client nodes. The
disadvantages of using this procedure are as follows:
— You cannot install software on the clients until you have created the golden image on
the head node.
— You must manually run commands that would be executed automatically by RPM.
•
Use the cexec command to run the RPM utility on the client nodes, and create the golden
image after the clients have been updated. The advantages of using this procedure are as
follows:
Installing the Software
27
—
—
You can immediately install software on the clients; you do not have to wait until you
have created the golden image on the head node.
You do not have to manually run commands that are automatically executed by RPM.
The disadvantage of this method is that it is not a standard HP XC installation procedure.
These procedures are described in the following sections. To use these procedures, you must
have the HP XC Image Replication and Distribution Environment configured as described in HP
XC System Software: Administration Guide.
Using the Full Imaging Installation Procedure
To install the HPCPI and Xtools packages using the full imaging installation procedure for HP
XC clusters, follow these steps:
1.
2.
Log in as superuser on the head node.
If previous versions of the packages are already installed on the system, use rpm -e to
remove them:
# rpm -e package_file package_file...
IMPORTANT:
3.
Do not use the rpm -U command to update previous versions of the packages.
Use RPM to install the packages as follows:
# rpm -ivh package_file package_file...
4.
Create a new golden image as follows:
# updateimage -gc `nodename`
Where `nodename` resolves to the name of the local node, which is the head node and
image server.
5.
Stop and restart all nodes in the cluster:
# stopsys
# startsys -image_and_boot
For example:
# rpm -ivh hpcpi-0.6.6-rh4.ia64.rpm \
xtools-common-0.6.6-rh4.ia64.rpm \
xtools-clients-0.6.6-rh4.ia64.rpm \
xtools-xc_clients-0.6.6-rh4.ia64.rpm
# updateimage --gc `nodename`
# stopsys
# startsys
Using the Imaging Installation Procedure with Manual Propagation
To install the HPCPI and Xtools packages using the imaging installation procedure for HP XC
clusters with manual propagation, follow these steps:
1.
2.
Log in as superuser on the head node.
If previous versions of the packages are already installed on the system, use rpm -e to
remove them:
# rpm -e package_file package_file...
NOTE:
3.
Do not use the rpm -U command to update previous versions of the packages.
Use RPM to install the packages on the head node as follows:
# rpm -ivh package_file package_file...
28
Installing HPCPI and Xtools
4.
Set the shell variable nn to `nodename` to shorten the commands in the remainder of this
procedure:
# nn=`nodename`
Where `nodename` resolves to the name of the local node, which is the head node and
image server.
(The nn shell variable is used in the cexec commands to exclude the local node from the
command execution.)
5.
Create a new golden image but do not set the clients for network reboot as follows:
# updateimage --gc `nodename` --no-netboot
6.
Use cexec to copy the image file to the client nodes, run si_updateclient, manually
start the hpcpi subsystem, and restart xinetd. If the packages were previously installed
on the client, you must also remove them before running si_updateclient. For example:
# cexec -x $nn -a scp \
$nn:/etc/systemimager/updateclient.local.exclude \
/etc/systemimager
# cexec -x $nn -a 'si_updateclient \
--quiet --server headNode --image base_image \
--no-bootloader -override base_image'
# cexec -x $nn chkconfig -add hpcpi
# cexec -x $nn /sbin/service hpcpi start
# cexec -x $nn /sbin/service xinetd restart
Where headnode is the name of the head node.
For example:
# rpm -ivh hpcpi-0.6.6-rh4.ia64.rpm \
xtools-common-0.6.6-rh4.ia64.rpm \
xtools-clients-0.6.6-rh4.ia64.rpm \
xtools-xc_clients-0.6.6-rh4.ia64.rpm
# nn=`nodename`
# updateimage --gc $nn --no-netboot
# cexec -x $nn -a scp \
$nn:/etc/systemimager/updateclient.local.exclude \
/etc/systemimager
# cexec -x $nn -a 'si_updateclient \
--quiet --server myHeadNode --image base_image \
--no-bootloader -override base_image'
# cexec -x $nn chkconfig -add hpcpi
# cexec -x $nn /sbin/service hpcpi start
# cexec -x $nn /sbin/service xinetd restart
Using cexec to Run RPM on the Clients
This procedure installs the HPCPI and Xtools packages by using cexec to run RPM on each
client. You update the golden image after the packages are installed on the clients so that future
image distributions will include the HPCPI and Xtools software. To use this procedure, follow
these steps:
1.
2.
Log in as superuser on the head node.
If previous versions of the packages are already installed on the system, use rpm -e to
remove them:
# rpm -e package_file package_file...
IMPORTANT:
3.
Do not use the rpm -U command to update previous versions of the packages.
Use RPM to install the packages on the head node as follows:
# rpm -ivh package_file package_file...
Installing the Software
29
4.
Copy the package files to the shared directory /hptc_cluster as follows:
# cp package_file package_file ... /hptc_cluster
5.
Verify that HPCPI and Xtools are not running on the client, and that no time-sensitive tasks
are running.
Run RPM on each remote client. You can do this using cexec, or you can use the job
scheduler to submit one job for each node, where the job runs rpm on the target client. For
example:
# cexec -x `nodename` -a 'cd /hptc_cluster ;\
rpm -i package_name package_name ...'
6.
After you have installed the packages on all cluster clients, remove the package files from
the /hptc_cluster directory as follows:
# cd /hptc_cluster
# rm package_name package_name ...
7.
Create a new golden image but do not set the clients for network reboot as follows:
# updateimage --gc `nodename` --no-netboot
Where head_node_name is the name of the head node (the image server).
For example:
# rpm -ivh hpcpi-0.6.6-rh4.ia64.rpm \
xtools-common-0.6.6-rh4.ia64.rpm \
xtools-clients-0.6.6-rh4.ia64.rpm \
xtools-xc_clients-0.6.6-rh4.ia64.rpm
# cp hpcpi-0.6.6-rh4.ia64.rpm \
xtools-common-0.6.6-rh4.ia64.rpm \
xtools-clients-0.6.6-rh4.ia64.rpm \
xtools-xc_clients-0.6.6-rh4.ia64.rpm \
/hptc_cluster
# cexec -x `nodename` -a 'cd /hptc_cluster ;\
rpm -i hpcpi-0.6.6-rh4.ia64.rpm \
xtools-common-0.6.6-rh4.ia64.rpm \
xtools-clients-0.6.6-rh4.ia64.rpm \
xtools-xc_clients-0.6.6-rh4.ia64.rpm
# cd /hptc_cluster
# rm hpcpi-0.6.6-rh4.ia64.rpm \
xtools-common-0.6.6-rh4.ia64 \
xtools-clients-0.6.6-rh4.ia64.rpm \
xtools-xc_clients-0.6.6-rh4.ia64.rpm
# updateimage --gc `nodename --no-netboot`
Verifying the Installation
To verify the installation, query RPM as follows:
# rpm -q package_name
If the package is properly installed, RPM displays a message with the package version string.
For example:
# rpm -q hpcpi
hpcpi-0.6.6-200712101436
Removing the Software
To remove the HPCPI and Xtools software, verify that no one is using HPCPI or Xtools, as
described in “Before Installing the Software” (page 26). Remove the software using the rpm -e
command as follows:
# rpm -e package_file package_file...
30
Installing HPCPI and Xtools
3 Getting Started with HPCPI
This chapter shows the commands used in a simple HPCPI user session.
NOTE: The program analyzed in this chapter is a simple program selected for illustrative
purposes and is not representative of the types of programs most users analyze.
This chapter addresses the following topics:
• “Simple HPCPI Session” (page 31)
• “Step 1: Loading the HPCPI Environment” (page 31)
• “Step 2: Setting the HPCPI Database Environment Variable (HPCPIDB)” (page 31)
• “Step 3: Creating the HPCPI Database Directory” (page 32)
• “Step 4: Starting the HPCPI Daemon” (page 32)
• “Step 5: Running the Code You Want to Analyze” (page 32)
• “Step 6: Flushing the HPCPI Data to Disk” (page 32)
• “Step 7: Viewing Per Image Statistics for the System” (page 32)
• “Step 8: Viewing Per Procedure Statistics for the Application” (page 33)
• “Step 9: Viewing Per Instruction Statistics” (page 33)
• “Step 10: Stopping the HPCPI Daemon” (page 33)
Simple HPCPI Session
The following HPCPI session shows the basic commands for setting up and starting HPCPI to
collect data, and commands to analyze the data. The steps are numbered and described in the
sections that follow.
% module load hpcpi
#1 Load the HPCPI environment
# (or setenv PATH /opt/hpcpi/bin:$PATH)
% setenv HPCPIDB /tmp/hpcpidb #2 Set the DB environment variable
% mkdir -p $HPCPIDB
#3 Create the DB directory
% hpcpid
#4 Start the HPCPI daemon
% ./myApp
#5 Run the image you want to profile
% hpcpictl flush
#6 Flush the data
% hpcpiprof
#7 View per-image statistics
% hpcpiprof myApp
#8 View per-procedure statistics
% hpcpilist routine1 myapp
#9 View per-line statistics
% hpcpictl quit
#10 Stop the HPCPI daemon
Step 1: Loading the HPCPI Environment
The HPCPI daemon and utilities are installed in the /opt/hpcpi/bin directory by default. On
systems with the modules utility installed, you can enter the following command to add
/opt/hpcpi/bin to your PATH environment variable:
% module load hpcpi
Alternatively, you can manually add /opt/hpcpi/bin to your PATH. For example, you can
enter the following command:
% setenv PATH /opt/hpcpi/bin:$PATH
Step 2: Setting the HPCPI Database Environment Variable (HPCPIDB)
The environment variable HPCPIDB enables you to set the default HPCPI database directory for
the HPCPI daemon and HPCPI utilities. Enter the appropriate shell command to set the HPCPIDB
environment variable to the directory you want to use for the HPCPI database. For example, you
can use a command similar to the following:
% setenv HPCPIDB my_directory
Simple HPCPI Session
31
You will create the directory in the next step.
The following example uses the directory /tmp/hpcpidb:
% setenv HPCPIDB /tmp/hpcpidb
For information about selecting directories for HPCPI databases, see “Selecting a Location for
the HPCPI Database Directory” (page 36).
Step 3: Creating the HPCPI Database Directory
Enter the following command to create the directory for the database:
% mkdir -p $HPCPIDB
Step 4: Starting the HPCPI Daemon
Enter the following command to start the HPCPI daemon (hpcpid):
% hpcpid
The HPCPI daemon displays information about the events it monitors. By default, the daemon
monitors CPU cycles used. For information on specifying other events for monitoring, see
“Selecting Events to Monitor” (page 37).
Step 5: Running the Code You Want to Analyze
Run the code you want to analyze. In this example, the user analyzes the code for myApp:
% ./myApp
Step 6: Flushing the HPCPI Data to Disk
After you run the application, enter the following command to flush the HPCPI data to disk:
% hpcpictl flush
By default, the hpcpid daemon flushes data to disk every 10 minutes. Running the hpcpictl
flush command causes hpcpid to immediately flush the data to disk so the HPCPI analysis
tools can read the data.
Step 7: Viewing Per Image Statistics for the System
The following command displays per-image statistics for all binary images that were active on
the system and for which HPCPI recorded performance data:
% hpcpiprof
The hpcpiprof utility prints a header, followed by a table with an entry containing data for
each image or procedure. In this example, the header is as follows:
Event Name
---------CPU_CYCLES
Events
------------7969037220000
Period
-----60000
Samples
--------132817287
The header contains one entry for each event.
Next, hpcpiprof displays a data table with per-image statistics:
CPU_CYCLES
---------385649e7
198708e7
192510e7
10636e7
:
:
%
----48.4%
24.9%
24.2%
1.3%
cum%
-----48.4%
73.3%
97.5%
98.8%
image
---------------------------vmlinux-2.6.9-34.7hp.XCsmp
libm-2.3.4.so
myApp
libperl.so
For descriptions of the output data, see“Viewing Per-Image Data: hpcpiprof” (page 44).
32
Getting Started with HPCPI
Step 8: Viewing Per Procedure Statistics for the Application
The following command enables you to view per-procedure statistics for the image myApp:
% hpcpiprof myApp
The output is as follows:
Event Name
Events
---------- ------------CPU_CYCLES 1925103240000
Period
Samples
------ -------60000 32085054
CPU_CYCLES
---------191201e7
1309e7
procedure
----------routine1
unknown_rou
%
----99.3%
0.7%
cum%
-----99.3%
100.0%
image
----myApp
myApp
For descriptions of the output data, see“Viewing Per-Procedure Data: hpcpiprof image_name”
(page 46).
Step 9: Viewing Per Instruction Statistics
The following command enables you to view per-instruction statistics for the procedure routine1
in myApp:
% hpcpilist routine1 myApp
The output is as follows:
Event Name
---------CPU_CYCLES
Events
------------1912010880000
CPU_CYCLES
---------0
0
0
0
0
0
2333e6
PC
--------------routine1+0x0000
routine1+0x0001
routine1+0x0002
routine1+0x0010
routine1+0x0011
routine1+0x0012
routine1+0x0020
Period
-----60000
B
-
:
ASM
------------------------------------alloc
r33=ar.pfs,0,4,0,0
adds
r34=0,sp
adds
sp=-16,sp
adds
r35=0,gp
nop.f
0
mov
r32=rp;;
ldfs
f7=[r34]
:
:
For descriptions of the output data, see “Viewing Per-Instruction Data: hpcpilist
procedure_name image_name” (page 47) .
Step 10: Stopping the HPCPI Daemon
When you are done collecting data, enter the following command to stop the HPCPI daemon:
% hpcpictl quit
Step 8: Viewing Per Procedure Statistics for the Application
33
34
4 Using HPCPI
This chapter describes how to perform basic HPCPI tasks, including how to start HPCPI, control
the HPCPI daemon, and view data using HPCPI tools. This chapter also includes tips on using
HPCPI.
This chapter addresses the following topics:
• “Starting HPCPI” (page 35)
— “Setting Up the HPCPI Environment” (page 35)
— “Selecting a Location for the HPCPI Database Directory” (page 36)
— “Setting the Default Database Directory Environment Variable (HPCPID)” (page 36)
— “Starting the hpcpid Daemon” (page 36)
— “Selecting Events to Monitor” (page 37)
•
•
•
•
•
•
•
•
•
“Running an Application for Analysis” (page 40)
“Controlling the Daemon with hpcpictl” (page 41)
“Viewing Data with hpcpiprof, hpcpilist, and hpcpitopcounts” (page 43)
“Viewing Per-Image Data: hpcpiprof” (page 44)
“Viewing Per-Procedure Data: hpcpiprof image_name” (page 46)
“Viewing Per-Instruction Data: hpcpilist procedure_name image_name” (page 47)
“Listing the Instructions with the Highest Event Counts: hpcpitopcounts” (page 49)
“Listing Instructions in an Image: hpcpitopcounts image_name” (page 50)
“HPCPI Utility Options” (page 51)
— “Specifying an Alternate Database” (page 51)
— “Specifying an Alternate Epoch” (page 51)
— “Selecting Data by System” (page 51)
— “Specifying Events to Display” (page 52)
— “Selecting Data by Label” (page 52)
— “Specifying an Alternate Sort Key” (page 53)
— “Limiting the hpcpiprof Output” (page 54)
— “Displaying Raw Values” (page 53)
— “Additional Options” (page 54)
•
“Tips and Best Practices for Using HPCPI” (page 55)
Starting HPCPI
This section describes the tasks you must complete to start HPCPI, and describes operating
parameters you can set when you start HPCPI.
Setting Up the HPCPI Environment
On systems with the modules utility, enter the following command to set up the HPCPI
environment:
% module load hpcpi
Alternatively, you can manually add the HPCPI binary directory (/opt/hpcpi/bin) to your
PATH environment variable, and /opt/hpcpi/man to your MANPATH. For example:
% setenv PATH /opt/hpcpi/bin:$PATH
% setenv MANPATH /opt/hpcpi/man:$MANPATH
Starting HPCPI
35
Selecting a Location for the HPCPI Database Directory
The HPCPI database directory contains files with performance data. The files are organized in
subdirectories by epoch date and system name (see “HPCPI Database Directories and Files”
(page 113) ). The hpcpid daemon writes data to the files and the HPCPI analysis tools
(hpcpiprof, hpcpilist, hpcpitopcounts, and hpcpicat) read data from the files.
When selecting the location for the HPCPI database directory, verify that the directory meets
the following requirements:
• The directory must be writeable by the who user starts the hpcpid daemon. (The hpcpid
daemon writes the data files with the user ID of the user invoking the daemon, so the
directory must be writable by that user.)
• The directory must have sufficient space available for the HPCPI data. HP recommends that
you allocate at least 20 MB for HPCPI data.
Setting the Default Database Directory Environment Variable (HPCPID)
The environment variable HPCPIDB enables you to set the default HPCPI database directory for
the HPCPI daemon and HPCPI utilities. For example, the following command sets the HPCPIDB
environment variable to the /tmp/hpcpidb directory:
% setenv HPCPIDB /tmp/hpcpidb
Starting the hpcpid Daemon
To start the hpcpid daemon, enter the following command:
% hpcpid
By default, hpcpid does the following:
• Runs as a daemon. The hpcpid process places itself in the background, detaches from the
terminal, and redirects its output to its log file.
• Monitors the CPU cycles used. To monitor other events, use the -events option, as described
in the next section (“Selecting Events to Monitor” (page 37)).
• Starts a new epoch, or time period in the database. Epochs provide a method for partitioning
data by time, and by default hpcpid creates a new epoch each time it starts.
The daemon creates a new subdirectory below the database directory. The directory name
is based on the Greenwich Meantime (GMT) timestamp for the start of the epoch, in the
format YYYYMMDDHHMM (year-month-day-hours-minutes). For more information, see “HPCPI
Database Directories and Files” (page 113).
The hpcpid command supports options that affect its behavior, including options to select the
events to be monitored, specify the database directory, and run the hpcpid as a foreground
process (instead of as a daemon). For more information, see hpcpid(1).
Startup Information for hpcpid
When hpcpid starts, it displays information similar to the following:
*
* HP ONLY -- This program and its usage is HP PROPRIETARY
*
Image build time: 14:33:11 Dec 10 2007 1
Using info for 'Itanium 2 (model 0x1f)' PMU
1 tags, user definition: 2
pretty
formal interval
------ ---------- -------Cycles CPU_CYCLES
60000
maintainVCT = false
36
Using HPCPI
duty
-----always
randomize
--------no
1 groups; user definition: 3
#
4
5
6
7
1 CPU_CYCLES <empty> <empty> <empty>
---multiplexing interval = 1000000
---Logging to /usr/users/who1/myDB/hpcpid-node6.log
Daemon is running on pid 1297
4
Many of the data fields are for HP use only. You can use the following data fields:
1
2
Build date for the daemon.
Include this information
when reporting HPCPI
problems.
Table showing the events
hpcpi is monitoring.
3
Table showing the number
of event groups and the
events in each group. If
hpcpid is monitoring more
events than the number of
hardware event counters
available for the processor
PMU, hpcpid places the
events in duty groups and
cycles through the duty
groups so that only the
events in one duty group
are monitored at any time.
This table shows that the
processor on this system
supports four event
counters, numbered 4, 5, 6,
and 7.
4
Location of the log file for the
daemon. The daemon writes
log data (errors, warnings,
and debugging information)
to a file in the base of the
HPCPI database directory.
You can use this information
to verify the HPCPI database
directory. In this example,
the database directory is
/usr/users/who1/myDB.
See hpcpid (1)for more
information.
Selecting Events to Monitor
By default, hpcpid monitors CPU cycles used. Use the -events option to specify alternate
events when you start the hpcpid daemon. The HPCPI analysis tools display data for all events
monitored by default, but you can specify a subset of the monitored events for the output.
The Intel Itanium and AMD Opteron PMUs can monitor numerous events. To simplify the
specification of commonly used events, HPCPI provides predefined event sets, which contain
multiple events. Table 4-1 (page 38) lists some of the predefined event sets.
To view the names of the valid events for your processor, enter the following command:
% hpcpid -show-events
To view the names of the valid event sets for your processor and the events contained in each
set, enter the following command:
% hpcpid -show-event-sets
To specify the events you want to monitor, start the hpcpid with one or more -events
statements. The syntax is as follows:
hpcpid -events
event_name|event_set_name[:interval=value][,event_name|event_set_name...]
[-events...]
Where:
event_name
Specifies an event name. Event names that end in .ALL typically have
related event names that start with the same or similar text.
Starting HPCPI
37
event_set_name
value
Specifies an event set name.
Specifies the event interval, which is the number of times an event is
recorded by the PMU before generating an interrupt for hpcpid to record
a sample.
Range: 2000-65535
Default: 60000
Commonly Used Event Sets
Table 4-1 describes some of the more commonly used event sets. To see a complete list and the
events contained in each group, use the hpcpid -show-event-sets command.
Table 4-1 Commonly Used Event Sets
Event Set Name
Description
HelpMeEvents
A large number (approximately 20) of events, including metrics for instruction
counts, instruction bubbles, and cache delays.
IPCEvents
Events for computing instructions per cycle (IPC).
Computing IPC for Itanium processors requires multiple events. See “Itanium
Instruction Metrics” (page 57) for more information.
FPUEvents
Events for floating point units (FPUs).
StallEvents
Events for execution stalls. On AMD processors, this is a moderate number
of events. On Itanium systems, this is a large number of events.
CacheMissEvents
Events for data cache misses. Includes cache misses at different levels.
DCacheEvents
Data cache events.
ICacheEvents
Instruction cache events.
BranchEvents
Branch-related events. For example, branch taken, branch not taken, predicted,
mispredicted, predicted correct target, and predicted wrong target.
BranchEvents2
Additional branch-related events. Supported on Itanium processors only.
ServerEvents
Events for basic servers, such as CPU cycles used, stalls, memory loads with
long delays, and retired instructions.
HPCEvents
Events for High Performance Computing (HPC) systems.
TIP: If you do not want to run your application multiple times, you can collect a large number
of metrics with the HelpMeEvents event set. This event set enables you to run your application
once and capture events for deriving useful metrics. You can run the HPCPI display utilities
multiple times and select different events for the output.
Modifying the Event Interval Value
Decreasing the event interval value increases the number of interrupts generated to schedule
hpcpid, which can affect the statistics collected.
If you specify multiple event or event set names with an -events option, the interval qualifier
applies to all specified events or event sets. If you specify an interval qualifier for an event set
and all events in the set have the same default interval, the specified interval applies to all events
in the set. If the events in the set do not have the same default interval, the specified interval
applies to the first event in the set and hpcpid changes the intervals for the other events in the
set to values that retain the relative proportions of the intervals.
38
Using HPCPI
Event Duty Qualifier
The -events statement also supports a duty qualifier, which enables you to control how often
an event is monitored when you are monitoring more events than the number of hardware event
counters. For more information, see hpcpid (1).
Starting HPCPI
39
Running an Application for Analysis
After you start the HPCPI daemon, you can run the applications you want to analyze; run the
applications as you normally would. If you want to use an HPCPI label to isolate data for a
specific process, you can start the process and establish the label using the hpcpictl label
command.
Labeling Data
An HPCPI label enables you to isolate performance data for processes according to process ID,
process group ID, user ID, or CPU number. To create a label, use the hpcpictl label command.
You can specify the label name when using HPCPI analysis tools to select the data set for the
label.
In its simplest form, the hpcpictl label command has the following syntax:
hpcpictl label label_name command [arg...]
The hpcpictl label command starts a label with the specified name and starts a process to
execute the specified command. HPCPI associates performance data from the process it starts
with the label. The optional arguments are arguments for the specified command.
The duration of the label is the lifetime of the process.
For example, the user enters the following commands:
% hpcpictl label myLabel myApp
% hpcpictl flush
% hpcpiprof -label myLabel
The hpcpictl label also enables you to select data for processes according to process ID,
process group ID, user ID, or CPU number. Selecting data by process group ID is useful when
profiling utilities that spawn additional processes. For example, the following command executes
the command make all and associates data from all processes with the same process group ID
as the make process (-pgid this) with the label make_all:
% hpcpictl label make_all -pgid this make all
For more information about using HPCPI labels, see Chapter 5 (page 59).
40
Using HPCPI
Controlling the Daemon with hpcpictl
The hpcpictl utility is a userspace application that controls the operation of the hpcpid
daemon. You can use hpcpictl to do the following:
•
•
•
•
Flush HPCPI data to disk (hpcpictl flush)
Stop the HPCPI daemon (hpcpictl quit)
Start a new epoch (hpcpictl epoch)
Show information about the HPCPI daemon (hpcpictl show)
Flushing Data to Disk: hpcpictl flush
By default, the hpcpid daemon flushes data to disk every 10 minutes. Run the hpcpictl
flush command to immediately flush the data to disk so the HPCPI analysis tools can read the
data.
CAUTION: If you do not flush HPCPI data before running the HPCPI display utilities, the
output from the utilities might not be accurate.
Stopping the Daemon: hpcpictl quit
The hpcpictl quit command stops the hpcpid daemon.
TIP: Only one instance of the HPCPI daemon (hpcpid) can run on a system. Because hpcpid
runs as a daemon (it detaches from your session), the hpcpid process does not terminate when
you end your user session. After you finish collecting data, stop the daemon as a courtesy to
other users on the system.
Starting a New Data Epoch: hpcpictl epoch
The hpcpictl epoch command starts a new epoch after flushing all in-memory HPCPI data
to disk. By default, the HPCPI analysis tools display data from the most recent epoch, but you
can specify alternate epochs.
HPCPI stores data for each epoch in a separate subdirectory under the HPCPI data directory.
The default epoch naming convention uses the GMT (Greenwich Mean Time) timestamp for the
start of the epoch, in the format YYYYMMDDHHMM (year-month-day-hours-minutes). For more
information, see “HPCPI Database Directories and Files” (page 113).
Displaying HPCPI Status Information: hpcpictl show
The hpcpictl show command displays status information from the hpcpi daemon. This
command is useful for determining the current HPCPI database directory and epoch. For example:
% hpcpictl show
Current database directory: /var/users/who1/foo
Current epoch:
200802121959
Local machine name:
onfire16
VCT info:
Device name:
offset to vct:
ncpus:
mmap offset for
mmap offset for
nevents:
0:
1:
/dev/ecount
0x20c0
2
0x40000000
0x41000000
1
Events being monitored:
Controlling the Daemon with hpcpictl
41
pretty proper name interval rnd
duty active
------ ----------- -------- --- ------ -----Cycles CPU_CYCLES
60000 no always
1/1
hpcpictl show successful
42
Using HPCPI
Viewing Data with hpcpiprof, hpcpilist, and hpcpitopcounts
HPCPI provides the following utilities to display HPCPI data:
• hpcpiprof
Displays performance profiles for systems (per-image data) or images (per-procedure data).
•
hpcpilist
Lists per-line performance statistics for a procedure.
•
hpcpitopcounts
Lists the instructions with the most counts for performance events.
HPCPI also includes the hpcpicat utility, which displays the contents of a performance data
file with minimal formatting. This utility is primarily a debugging tool for advanced users and
is not intended for general performance profiling. This section does not include information
about using hpcpicat. For more information about hpcpicat, see “hpcpicat Output”
(page 114).
Default Input Data
By default, the hpcpiprof, hpcpilist, and hpcpitopcounts utilities select input data as
follows:
• Database directory: The directory specified by the HPCPIDB environment variable.
• Epoch: The most recent epoch.
• System: All systems with data in the database within the selected epoch. For single-system
programs, there is data only from the local system.
The HPCPI database structure contains a subdirectory for each system that writes
performance data in the epoch. This feature enables multiple systems to share an HPCPI
database, and is useful in cluster environments.
•
•
Labels: All labels. If there is no labeled data, the utilities display all data that meet the other
selection criteria.
Events: All events monitored by hpcpid. By default, hpcpid monitors only CPU cycles.
You can specify additional or alternate events to monitor when you start hpcpid, as described
in “Selecting Events to Monitor” (page 37).
“HPCPI Utility Options” (page 51) describes how to specify alternate input data for the
hpcpiprof, hpcpilist, and hpcpitopcounts utilities.
Flushing Data: hpcpictl flush
Before running the HPCPI display utilities, flush the data by entering the following command:
% hpcpictl flush
CAUTION: If you do not flush HPCPI data before running the HPCPI display utilities, the
output from the utilities might not be accurate.
Viewing Data with hpcpiprof, hpcpilist, and hpcpitopcounts
43
Viewing Per-Image Data: hpcpiprof
If you run hpcpiprof without an image name, it displays statistics for the system, partitioned
per-image. For example:
$ hpcpiprof
Event Name
---------CPU_CYCLES
CPU_CYCLES
---------385649e7
198708e7
192510e7
10636e7
4963e7
:
:
Events
------------7969037220000
Period
-----60000
%
----48.4%
24.9%
24.2%
1.3%
0.6%
image
---------------------------vmlinux-2.6.9-34.7hp.XCsmp
libm-2.3.4.so
myApp
libperl.so
libc-2.3.4.so
cum%
-----48.4%
73.3%
97.5%
98.8%
99.4%
Samples
--------132817287
The output consists of a header and a data table, as described in the sections that follow.
HPCPI Header
The output for HPCPI utilities starts with a header that summarizes the events selected for
display. By default, the utilities display statistics for all events monitored. For example:
Event Name
---------CPU_CYCLES
Events
------------7969037220000
Period
-----60000
Samples
--------132817287
The header contains one entry for each event. The columns are as follows:
Lists the event name.
Event Name
Lists the count for the event as calculated by HPCPI. This is the number
Events
of sampled events times the sample interval. If the HPCPI daemon
sampled more events than the number of performance counters, HPCPI
adjusts the event count for the fraction of duty groups in which the event
was active.
Lists the sampling interval for the event.
Period
Lists the total number of samples HPCPI recorded for the event.
Samples
The fraction of time the event was active in the PMU, expressed as a
Active Fraction
percentage. This column is present only if the number of events sampled
was greater than the number of performance counters in the PMU.
hpcpiprof Image Data Table
The image data table contains one entry per image. The entries are listed in descending order
according to the event count for the image.
The columns contain the following information:
event_name
Lists the count for the event (such as CPU_CYCLES) that occurred in the image,
as calculated by HPCPI. This is the number of sampled events multiplied by
the sample interval. If the HPCPI daemon sampled more events than the number
of performance counters, HPCPI adjusts the event count for the fraction of duty
groups in which the event was active.
By default, the HPCPI utilities displays values in exponential notation, scaled
uniformly so that the largest value has six significant figures. To display the
values using raw numbers, specify the -raw-numbers option, as described in
“Displaying Raw Values” (page 53).
44
Using HPCPI
%
cum%
image
Lists the percentage of event samples for the event type that occurred in the
image.
Lists the cumulative percentage of all event samples for this entry and all entries
above it. In this example, the event count for the first image
(vmlinux-2.6.9-34.7hp.XCsmp) is 48.4% of the recorded total, and the
event count for the first and second images together is 73.3% of the recorded
total.
Lists the image name.
hpcpiprof Output with Multiple Events
The following listing shows hpcpiprof output on a system where HPCPI monitored four events.
The header table contains an entry for each event. The first event in the table is CPU_CYCLES,
and the first column in the data table contains the event count for this event. The values in the
percentage (%) and cumulative percentage (cum%) columns are also for the CPU_CYCLES event.
Columns with event counts for the additional events (NOPS_RETIRED,
PREDICATE_SQUASHED_RETIRED, and IA64_INST_RETIRED) are to the right of the cumulative
percentage (cum%) column.
% hpcpiprof
Event Name
-------------------------CPU_CYCLES
NOPS_RETIRED
PREDICATE_SQUASHED_RETIRED
IA64_INST_RETIRED
CPU_CYCLES
---------231280e7
78618e7
54924e7
:
%
----61.7%
21.0%
14.7%
cum%
----61.7%
82.7%
97.4%
Events
------------3746638560000
3667613640000
48274800000
8966192880000
NOPS_
RETIRED
-------283236e7
29426e7
50809e7
Period
-----60000
60000
6000
60000
PREDICATE_
SQUASHED_
RETIRED
---------61e7
2e7
4519e7
Samples
--------62443976
61126894
8045800
149436548
IA64_INST_
RETIRED
---------689484e7
97186e7
99522e7
image
-------------------------vmlinux-2.6.9-34.7hp.XCsmp
libm-2.3.4.so
myApp
:
The hpcpiprof utility sorted the entries in the data table according to the CPU_CYCLES. When
hpcpiprof displays information for multiple events, it sorts the data table entries according to
the event count for the first event in the HPCPI database. To specify an alternate sort key, use
the -st event_name option, as described in “Specifying an Alternate Sort Key” (page 53).
Viewing Per-Image Data: hpcpiprof
45
Viewing Per-Procedure Data: hpcpiprof image_name
If you run hpcpiprof with an image name, it displays statistics for the image, partitioned
per-procedure. For example:
% hpcpiprof myApp
myApp: not found.
+ Found and using /var/users/who1/bin/myApp
------Event Name
Events
Period
Samples
---------- ------------- ------ -------CPU_CYCLES 1925103240000
60000 32085054
CPU_CYCLES
---------191201e7
1309e7
%
----99.3%
0.7%
cum%
-----99.3%
100.0%
procedure
----------routine1
unknown_rou
image
----myApp
myApp
The output is described in the following sections.
HPCPI Procedure Header
The header uses the common HPCPI format, as described in “HPCPI Header” (page 44), with
the following differences:
•
The utility searches for a binary file with the specified image name and a checksum value
that matches the checksum value for the image that generated the event data.
In this example, myApp is a relative pathname and the file is not in the current directory.
The message myApp: not found indicates that hpcpiprof did not find the image myApp
in the current directory with the appropriate checksum value.
The message + Found and using /var/users/who1/bin/myApp indicates the full
path of the binary it found.
•
The event count and sample values are relative to the events recorded for the image.
hpcpiprof Procedure Data Table
The data table for procedure data uses the format as described in “hpcpiprof Image Data Table”
(page 44), with the following differences:
•
•
46
The column procedure contains the procedure name. The procedure name unknown_rou
represents events that can not be attributed to a specific routine. For more information, see
“HPCPI Product Limitations” (page 115).
The values in each entry are relative to the events recorded for the procedure in the image.
Using HPCPI
Viewing Per-Instruction Data: hpcpilist procedure_name
image_name
The hpcpilist utility lists HPCPI performance statistics per line of source and/or assembly
code in a procedure within the specified image file.
For example:
% hpcpilist routine1 myApp
myApp: not found.
+ Found and using /var/users/who1/bin/myApp
------Event Name
Events
Period
---------- ------------- -----CPU_CYCLES 1912010880000
60000
CPU_CYCLES
---------0
0
0
0
0
PC
--------------routine1+0x0000
routine1+0x0001
routine1+0x0002
routine1+0x0010
routine1+0x0011
B
-
main+0x0050
main+0x0051
main+0x0052
main+0x0060
main+0x0061
:
ASM
-----------------------------------alloc
r33=ar.pfs,0,4,0,0
adds
r34=0,sp
adds
sp=-16,sp
adds
r35=0,gp
nop.f
0
:
:
58881e5
61012e5
0
52402e5
327227e5
:
:
adds
ldfs
addl
ldfs
nop.m
r14=8,r36;;
f7=[r14]
r14=160,gp;;
f6=[r14];;
0
The output is described in the following sections.
hpcpilist Header
The header uses the HPCPI header format for procedures, as described in “HPCPI Procedure
Header” (page 46).
hpcpilist Data Table
The data table contains one entry per instruction. The entries listed in ascending order according
to the address offset for the instruction.
The columns contain the following information:
event_name
PC
B
ASM
Source
Lists the count for the event (such as CPU_CYCLES) recorded for the instruction,
as calculated by HPCPI. This is the number of sampled events multiplied by
the sample interval. If the HPCPI daemon sampled more events than the number
of performance counters, HPCPI adjusts the event count for the fraction of duty
groups in which the event was active.
The routine name and the address offset for the instruction pointer or program
counter (PC).
The branch target. A colon in this column indicates the instruction is the target
of a branch (including loop exits).
The assembly code instruction (disassembled from the image).
The name of the source file and line offset. This field is present only if
hpcpilist is able to locate the source file.
Viewing Per-Instruction Data: hpcpilist procedure_name image_name
47
Interpreting hpcpilist Event Counts
The value of the instruction pointer recorded is typically several or many instructions after the
instruction that caused the event. This lag or skid is common to all profilers that sample instruction
pointers and HPCPI does not attempt to model the system to correct for this. As a result, HP
recommends that you examine the assembly code surrounding regions where high event counts
occur and consider if the surrounding code might be triggering the events. For more information,
see “HPCPI Product Limitations” (page 115).
48
Using HPCPI
Listing the Instructions with the Highest Event Counts: hpcpitopcounts
The hpcpitopcounts utility displays the n instructions with the highest counts for an event.
By default, n is 100. To display an alternate number of instructions, use the -n option.
If the HPCPI daemon monitored multiple events, hpcpitopcounts uses the first event in the
database as the sort key. To specify an alternate sort key, use the -st option, as described in
“Specifying an Alternate Sort Key” (page 53).
If you run hpcpitopcounts without an image name, it searches all profile files for the
instructions with the top counts. For example:
% hpcpitopcounts
Event Name
Events
---------- ------------CPU_CYCLES 5637154800000
Period
-----60000
Samples
-------93952580
CPU_CYCLES
%
cum
procedure
pc
source
---------- ---- ---- --------------- -------------------------1364e09 24.2 24.2 default_idle
0xa000000100017640
unknown_src
1303e09 23.1 47.3 default_idle
0xa000000100017630
unknown_src
1269e09 22.5 69.8 default_idle
0xa000000100017620
unknown_src
:
:
instruction
image
---------------------
-----------------------
adds
r16=0xde0,tp
vmlinux-2.6.9-34.7hp.XCsmp
hint.m
0
vmlinux-2.6.9-34.7hp.XCsmp
nop.m
0
vmlinux-2.6.9-34.7hp.XCsmp
The output is described in the following sections.
hpcpitopcounts Header
The header uses the HPCPI header format for procedures, as described in “HPCPI Procedure
Header” (page 46).
hpcpitopcounts Data Table
The data table contains one entry per instruction. The entries are listed in descending order
according to the event count for the instruction.
The columns contain the following information:
event_name
%
cum%
procedure
pc
instruction
image
source
The count for the event (such as CPU_CYCLES) recorded for the instruction,
as calculated by HPCPI. This is the number of sampled events multiplied by
the sample interval. If the HPCPI daemon sampled more events than the
number of performance counters, HPCPI adjusts the event count for the fraction
of duty groups in which the event was active.
The percentage of event samples for the event type recorded for the instruction.
The cumulative percentage of all event samples for this entry and all entries
above it.
The procedure name.
The offset for the instruction pointer or program counter.
The assembly code instruction (disassembled from the image).
The name of the image.
The name of the source file and line offset.
Listing the Instructions with the Highest Event Counts: hpcpitopcounts
49
Listing Instructions in an Image: hpcpitopcounts image_name
You can run hpcpitopcounts with an image name to list the instructions with the highest
event counts within an image. For example:
% hpcpitopcounts myApp
myApp: not found.
+ Found and using /var/users/who1/bin/myApp
------Event Name
Events
Period Samples
---------- ------------ ------ ------CPU_CYCLES 116980320000
60000 1949672
CPU_CYCLES
---------60800e06
8010e06
7973e06
:
:
%
---52.0
6.8
6.8
cum
procedure
---- ---------52.0 main
58.8 main
65.6 main
pc
----------------0x4000000000000811
0x40000000000008c0
0x40000000000008a1
instruction
-------------ldfs
farg0=[r14]
stfs
[r15]=f6
nop.m 0
image
----myApp
myApp
myApp
source
----------unknown_src
unknown_src
unknown_src
Interpreting hpcpitopcounts Event Counts
The value of the instruction pointer recorded is typically several or many instructions after the
instruction that caused the event. This lag or skid is common to all profilers that sample the
instruction pointer and HPCPI does not attempt to model the system to correct for this. In addition,
As a result, HP recommends that you examine the assembly code surrounding regions where
high event counts occur and consider if the surrounding code may be triggering the events. See
“HPCPI Product Limitations” (page 115) for more information.
50
Using HPCPI
HPCPI Utility Options
This section describes options for the hpcpiprof, hpcpilist, and hpcpitopcounts utilities.
Specifying an Alternate Database
By default, the HPCPI utilities use the value of the HPCPIDB environment variable as the HPCPI
database directory. Use the -db option to specify an alternate database. The syntax is as follows:
-db database
Where:
database
Specifies the directory for the HPCPI database.
Example
The following example specifies the directory /tmp/cpi/MyOtherDB as the HPCPI database
directory:
% hpcpiprof -db /tmp/cpi/MyOtherDB
Specifying an Alternate Epoch
By default, the HPCPI utilities search for data in the latest epoch. Use the -epoch option to
specify an alternate epoch. The syntax is as follows:
-epoch name | latest | latest-k | all
Where:
name
latest
latest-k
Specifies the name of the epoch. The default epoch naming convention uses the
GMT timestamp for the start of the epoch, in the format YYYYMMDDHHMM
(year-month-day-hours-minutes). The epoch name can also be a user-created
symbolic link to the epoch directory name. For more information, see “HPCPI
Database Directories and Files” (page 113).
Searches the latest epoch for data. This is the default behavior.
Searches the epoch k epochs prior to the current epoch for data.
For example, to display data from the epoch before the current epoch (such as
cases where you just started a new epoch and want to analyze the previous epoch),
specify -epoch latest-1.
all
Searches all epochs for data.
Example
In the following example, the user starts a new epoch before running myApp. After myApp finishes
running, the user closes the epoch and starts a new epoch:
%
%
%
%
hpcpictl epoch
./myApp
hpcpictl epoch
hpcpiprof -epoch latest-1
#1
#2
#3
#4
Start a new epoch
Run the program you want to profile
Flush data and start a new epoch
Select the epoch with myApp data
The second hpcpictl epoch command flushes the data to disk and starts a new epoch, which
becomes the latest epoch. By default, hpcpiprof displays data from the latest epoch, so in step
4, the user specifies -epoch latest-1 to display data from the epoch that was active when
myApp ran.
Selecting Data by System
By default, the hpcpiprof, hpcpilist, and hpcpitopcounts utilities search for profile files
in all system subdirectories in the epoch. In single-system environments, there is only one system
subdirectory in an epoch (the subdirectory for the local system), and there is no need to select
output from a specific system.
HPCPI Utility Options
51
In a cluster environment with a consolidated HPCPI database and synchronized epochs, you
might want to include or exclude the data from specific systems or nodes. To view data from
individual nodes, use the -hosts option to include or exclude the data from specific systems
or nodes. The syntax to include data from specific systems or nodes is as follows:
-hosts hostname[,hostname]...
The syntax to exclude data from specific systems or nodes is as follows:
-hosts all-hostname[,hostname]...
Where hostname is the name of the system or node.
Example
The following hpcpiprof command displays data from all systems in the database except node2
and node3:
% hpcpiprof -hosts all-node2,node3
Specifying Events to Display
By default, the HPCPI utilities display data for all events monitored. Many of the utilities display
a column of data for each event, and if hpcpid monitored a large number of events, it may be
difficult to read the output. You can use the -event option to specify a subset of events to
display. The syntax is as follows:
-event event_name[,event_name]...
-event all-event_name[,event_name]...
Where:
event_name[,event_name]...
Selects the event specified by event_name. You can specify additional event names delimited
by commas (,).
all-event_name[,event_name]...
Selects all events monitored except the event specified by event_name. You can specify
additional event names delimited by commas (,) to exclude multiple events.
Examples:
The following hpcpiprof command selects the events CPU_CYCLES and
PREDICATE_SQUASHED_RETIRED:
% hpcpiprof -event CPU_CYCLES,PREDICATE_SQUASHED_RETIRED
The following hpcpiprof command selects all events except NOPS_RETIRED:
% hpcpiprof -event all-NOPS_RETIRED
Selecting Data by Label
By default, the hpcpiprof, hpcpilist, and hpcpitopcounts utilities search for data with
all labels. If there is no labeled data, the utilities display all data that meet the other selection
criteria. To select data with a specific label, use the -label option. The syntax is as follows:
-label label_name [-label label_name...]
Where label_name is the name of the label.
Extracting Data for a Process from Shared Image Metrics
The performance data associated with the label includes data attributed to code executed in a
shared image (such as a shared library or kernel routine) that was called from the process. In the
following example, the user used the hpcpictl label command to associate performance
52
Using HPCPI
data for myApp with the label myLabel. The following command displays performance data for
code called by myApp from libc-2.3.4.so:
% hpcpiprof -label myLabel libc-2.3.4.so
Specifying an Alternate Sort Key
When hpcpiprof and hpcpitopcounts display information for multiple events, the utilities
sort the data table entries according to the event count for the first event in the HPCPI database.
To specify an alternate sort key, use the -st option. The syntax is as follows:
-st event_name
Where event_name is the name of the event.
Example
The following output was created using the same data as the example in “hpcpiprof Output
with Multiple Events” (page 45), but using the retired instruction count (IA64_INST_RETIRED)
instead of CPU cycles as the sort key. The values for the percentage (%) and cumulative percentage
(cum%) are for the event used as the sort key (IA64_INST_RETIRED).
% hpcpiprof -st IA64_INST_RETIRED
Event Name
-------------------------CPU_CYCLES
NOPS_RETIRED
PREDICATE_SQUASHED_RETIRED
IA64_INST_RETIRED
IA64_INST_
RETIRED
---------689484e7
99522e7
97186e7
:
:
%
----76.9%
11.1%
10.8%
cum%
----76.9%
88.0%
98.8%
Events
------------3746619240000
3667612920000
48274680000
8966193360000
CPU_CYCLES
---------231280e7
54924e7
78618e7
Period
Samples
------ --------60000
62443654
60000
61126882
6000
8045780
60000 149436556
NOPS_
RETIRED
-------283236e7
50809e7
29426e7
PREDICATE_
SQUASHED_
RETIRED
---------61e7
4519e7
2e7
image
-------------------------vmlinux-2.6.9-34.7hp.XCsmp
myApp
libm-2.3.4.so
Displaying Raw Values
By default, the HPCPI utilities display values in exponential notation and scaled uniformly so
that the largest value has six significant figures. To display the values using raw numbers, specify
the -raw-numbers option. For example:
% hpcpiprof -raw-numbers
The -raw-numbers option may be useful if some event counts are shown as zero values when
scaled in exponential notation because they are relatively low compared to other event counts.
For example, in the following excerpt from hpcpiprof output, the event counts for
DATA_EAR_EVENTS.CACHE_MISS.GE64 are shown as zero in exponential notation (0e5):
CPU_CYCLES
---------851896e5
100027e5
%
----89.5%
10.5%
cum%
-----89.5%
100.0%
DATA_EAR_
EVENTS
.CACHE_MISS
.GE64
----------0e5
0e5
procedure
----------main
unknown_rou
image
----myApp
myApp
Displaying the same data with the option -raw-numbers shows the nonzero values for
DATA_EAR_EVENTS.CACHE_MISS.GE64 event counts:
CPU_CYCLES
-----------
%
-----
cum%
------
DATA_EAR_
EVENTS
.CACHE_MISS
.GE64
-----------
procedure
-----------
image
----HPCPI Utility Options
53
85189620000
10002660000
89.5%
10.5%
89.5%
100.0%
28200
3000
main
unknown_rou
myApp
myApp
Limiting the hpcpiprof Output
The hpcpiprof -keep option lists entries only until the cumulative percentage meets a specified
value. This option is useful if you do not want to display entries with low statistical values.
The syntax for the option is as follows:
hpcpiprof -keep percentage
Where percentage is a floating point number in the 0 - 100 range.
For example, the following command displays hpcpiprof per-image data until the cumulative
CPU total is 99.44 percent or greater:
% hpcpiprof -keep 99.44
Event Name
Events
---------- ------------CPU_CYCLES 7969037220000
CPU_CYCLES
---------385649e7
198708e7
192510e7
10636e7
4963e7
2066e7
%
----48.4%
24.9%
24.2%
1.3%
0.6%
0.3%
cum%
-----48.4%
73.3%
97.5%
98.8%
99.4%
99.7%
Period
-----60000
Samples
--------132817287
image
---------------------------vmlinux-2.6.9-34.7hp.XCsmp
libm-2.3.4.so
myApp
libperl.so
libc-2.3.4.so
ipmi_si.ko
Additional Options
The hpcpiprof, hpcpilist, and hpcpitopcounts utilities also support options to perform
the following tasks:
Creates HTML output.
-output-format html
Suppresses the header data output. This option is useful for
-no-header
programs that parse the output of the utilities.
54
Using HPCPI
Tips and Best Practices for Using HPCPI
This section contains tips and best practices for using HPCPI.
Tips
To profile an application, you start by monitoring CPU cycles. After collecting and flushing the
HPCPI data, you can run the hpcpiprof command without specifying an image name to view
system activity, such as kernel and library activity. Next, run the hpcpiprof command with
your image name (hpcpiprof image_name) to determine which procedures are consuming
the most CPU cycles.
Use the hpcpilist command (hpcpilist procedure_name image_name) to view
per-instruction event counts and to determine which areas within a procedure are consuming
the most CPU cycles.
Look for hot spots. A hot spot is an image, procedure, or area of code with either very high or
very low event counts, depending on the event. When analyzing CPU cycles, a hot spot is an
area of code with a high event count. When analyzing other event counts or metrics such as
instructions per cycle, a hot spot can be an area of code with low event counts or metrics.
When examining the output from hpcpilist, note that instructions are bundled and a delay
can occur when recording the instruction pointer. When you encounter an instruction with a
high event count, HP recommends that you consider all instructions near that instruction as
possible sources of the event. For more information, see “HPCPI Product Limitations” (page 115).
The following listing shows a typical command sequence:
%
%
%
%
%
%
hpcpid
myApp arg1 arg2 (run your application)
hpcpictl flush
hpcpiprof
hpcpiprof myApp
hpcpilist routine2 myApp
Using Event Sets
You can also use event sets to monitor a large number of events and run the analysis tools multiple
times with the same data, but display different event statistics each time.
For example, the event set HelpMeEvents monitors a large number of events. You can collect
data with the HelpMeEvents event set. Initially, you view data for CPU cycles only, as shown
in the following example:
%
%
%
%
%
hpcpid -events HelpMeEvents
myApp arg1 arg2 (run your application)
hpcpictl flush
hpcpiprof -event CPU_CYCLES myApp
hpcpilist -event CPU_CYCLES routine2 myApp
To view statistics for all stalls on an Itanium system, select BACK_END_BUBBLE.ALL events. For
example:
% hpcpiprof -event BACK_END_BUBBLE.ALL myApp
% hpcpilist -event BACK_END_BUBBLE.ALL routine2 myApp
Stopping the Daemon After You Finish Collecting Data (hpcpictl quit)
Only one instance of the HPCPI daemon (hpcpid) can run on a system. Because hpcpid runs
as a daemon (it detaches from your session), the hpcpid process does not terminate when you
end your user session. After you finish collecting data, stop the daemon (hpcpictl quit) as
a courtesy to other users on the system.
Tips and Best Practices for Using HPCPI
55
Limiting the Event Count Display (hpcpiprof -keep Option)
If you have a lot of data, you can use the -keep option with hpcpiprof to limit the number of
event counts it displays. For example:
% hpcpiprof -keep 99
Using Database Directories, Epochs, or Labels to Organize Your Data
You can use different HPCPI database directories, epochs, or labels to organize performance
data from different applications or instances of an application.
Database Directories
To use a different database directory, you must stop and restart the daemon. You can specify
the database directory name using the -db option when starting the daemon and when running
the analysis tools. You can assign names to the directories that are meaningful to you.
For more information, see “Selecting a Location for the HPCPI Database Directory” (page 36)
and “Specifying an Alternate Database” (page 51).
Epochs
To start a new epoch, use the hpcpictl epoch command. When you use epochs, you do not
need to stop and restart the daemon. Another advantage is that labels are retained across epochs.
However, epoch names are not user configurable and are based on the time an epoch is started.
You might have problems remembering the application or activities that correlate to an epoch.
You can create a symbolic link with a meaningful name to an epoch directory, but this is an
additional task.
For more information, see “Starting a New Data Epoch: hpcpictl epoch” (page 41) and
“Specifying an Alternate Epoch” (page 51).
Labels
To create a label, use the hpcpictl label command. When you use labels, you do not need
to stop and restart the daemon. The HPCPI label mechanism enables you to partition data using
multiple parameters. Labels are a more flexible method to partition and notate performance data
than epochs.
For more information on using labels, see Chapter 5 (page 59).
Event Intervals
HP recommends that you do not use sampling intervals that are lower than the default values
when using non-Itanium processors.
Non-Itanium processors do not freeze event monitoring when an interrupt occurs to handle
event recording for the PMU. This causes events related to handling event recording to leak and
be attributed to images other than the interrupt handler. Decreasing the HPCPI sampling interval
(increasing the sampling frequency) increases this leakage.
Multiple Duty Groups
If you monitor more events than the number of event counters supported by the processor,
HPCPI cannot monitor all events at the same time. HPCPI places the events in duty groups and
multiplexes (cycles through) the duty groups so that the PMU only counts the events in one duty
group at a time (the active group). This multiplexing increases the leakage described in “Event
Intervals” (page 56). On non-Itanium processors, HP recommends that you do not use data
collected with multiple duty groups for fine-grained analysis.
56
Using HPCPI
Itanium Instruction Metrics
On Itanium processors, the event counter IA64_INST_RETIRED includes retired instructions
and retired no operation instructions (NOP_RETIRED) but not retired predicate squashed
instructions (PREDICATE_SQUASHED_RETIRED).
•
•
To calculate the total number of retired instructions, add IA64_INST_RETIRED and
PREDICATE_SQUASHED_RETIRED.1
To determine the number of effective retired instructions, subtract NOP_RETIRED from
IA64_INST_RETIRED.
Use the event set IPCEvents (hpcpid -events IPCEvents) to monitor all the events needed
to calculate instructions per cycle (CPU_CYCLES, IA64_INST_RETIRED,
PREDICATE_SQUASHED_RETIRED, and NOP_RETIRED).
To calculate the number of total retired instructions per cycle, use the following formula:
(IA64_INST_RETIRED + PREDICATE_SQUASHED_RETIRED)/CPU_CYCLES
Measuring Memory Controller and HyperTransport Events
Memory controller events, such as DRAM access, and HyperTransport events are system events.
On multicore processors, these events can be monitored only from core 0. These events are not
attributed to the process or thread that caused them if the process or thread does not execute on
core 0; instead, they are attributed to the process running on core 0 when the event is recorded.
To correctly measure memory controller and HyperTransport events on multicore processors,
restrict execution of the process or threads to a CPU that is core 0. You can use the contents of
the /proc/cpuinfo file to determine which CPUs are core 0 and the taskset utility to launch
a process with a specified CPU affinity.
HyperTransport Transmit and Receive Events
HyperTransport only monitors transmit events (data, command, and transmit event types). There
is no direct way to monitor HyperTransport receive events. However, you can infer receive
events by observing the transmit events on the sender. For example, memory requests from a
process running on CPU 1 for memory attached to CPU 2 generates HyperTransport transmit
requests on CPU 1.
Accessing memory from a remote processor generates HyperTransport traffic. A process might
accesses memory that is more than one hop away, through an intermediate CPU. For example,
a memory request from CPU 1 to CPU 3 might be transmitted through CPU 2. In this case, there
will be HyperTransport transmit events on CPU 1 and CPU 2.
1. “Errata (Processor and PAL)” in Intel® Itanium® 2 Processor Specification Update February 2005 states that the
IA64_RETIRED event count does not include predicated off instructions.
Tips and Best Practices for Using HPCPI
57
58
5 Using HPCPI Labels
This chapter describes how to use HPCPI labels. This chapter addresses the following topics:
• “Overview” (page 59)
• “Simple HPCPI Session Using Labels” (page 60)
• “Label Selectors” (page 62)
• “Multiple Labels” (page 64)
• “Reusing Labels” (page 64)
• “Comparing Epochs and Labels” (page 64)
• “Label Examples” (page 65)
• “Creating Labels in Programs” (page 67)
Overview
An HPCPI label enables you to isolate performance data for processes according to process ID,
process group ID, user ID, or CPU number. To create a label, use the hpcpictl label command.
You can specify the label name when using HPCPI analysis tools (hpcpiprof, hpcpilist,
hpcpitopcounts) to select the data set for the label.
When using labels, you must select the processes to generate the event data for the label and the
duration for the label. In its simplest form, the hpcpictl label command has the following
syntax:
hpcpictl label label_name command [arg...]
Using hpcpictl label starts a label with the specified name and executes the specified
command. The optional arguments are arguments for the specified command. By default, the
hpcpictl utility associates performance data from the process for the command it executes
with the label. The duration for the label is the lifetime of the process.
You can also select label data according to other selectors, such as parent PID, process group ID,
session ID, user ID, or CPU number. Examples where alternate selectors are useful include the
following:
• Using the process group ID to select data for utilities that spawn multiple processes, such
as make.
• Using the user ID to select data for utilities that create a new session and process group
when launching processes, such as mpirun.
“Label Selectors” (page 62) describes how to use HPCPI label selectors.
When you have a label for a process, HPCPI also applies the label to events for code executed
in other images by the process, such as routines in a shared library or kernel routines. For example,
if you have a label for the process myApp and myApp executes code from libc (libc-2.3.4.so),
HPCPI associates the events for that libc code with the label.
Overview
59
Simple HPCPI Session Using Labels
In the following session, the user associates the label myLabel with the performance data for a
single process, myApp. This example also uses the label with hpcpiprof to extract performance
data for myApp, including data for routines called by myApp from a shared library.
The following HPCPI session shows the commands for using HPCPI with labels. The steps are
numbered and described in the sections that follow.
%
%
%
%
%
%
%
%
%
module load hpcpi
#1 Set up & run hpcpid
setenv HPCPIDB /tmp/hpcpidb
# (continued)
mkdir -p $HPCPIDB
# (continued)
hpcpid
# (continued)
hpcpictl label myLabel ./myApp #2 Create the label & run the app
hpcpictl flush
#3 Flush the data
hpcpiprof -label myLabel
#4 Use the label with hpcpiprof
hpcpiprof -label myLabel libm-2.3.4.so
hpcpictl quit
#5 Stop the HPCPI daemon
Step 1: Setting Up the Environment and Starting the Daemon
To set up the HPCPI path, database and start the daemon, as described in “Simple HPCPI Session”
(page 31), enter the following commands:
%
%
%
%
module load hpcpi
setenv HPCPIDB my_directory
mkdir -p $HPCPIDB
hpcpid
Step 2: Establishing the Label and Running the Application
The hpcpictl label command establishes a label and runs the specified binary. By default,
HPCPI associates all performance data for the process with the specified label. The label is active
until the process terminates.
To establish the label myLabel, run myApp, and associate all data for that process with myLabel,
enter the following commands:
% hpcpictl label myLabel ./myApp
Step 3: Flushing the Data
To flush the HPCPI data to disk after the application completes, enter the following command:
% hpcpictl flush
This ensures the HPCPI data is written to disk and visible to the HPCPI analysis tools.
Step 4: Using the Label with hpcpiprof
The hpcpiprof utility supports the -label label_name option to isolate data associated
with a label. When you enter the hpcpiprof command without an image name, it displays
per-image statistics for all images on the system. The image with the most CPU cycles used is
typically the kernel (vmlinux-2.6.9-34.7hp.XCsmp), as shown in the following example
from a lightly loaded system:
% hpcpiprof
60
Event Name
---------CPU_CYCLES
Events
------------7969037220000
Period
-----60000
CPU_CYCLES
---------385649e7
198708e7
192510e7
%
----48.4%
24.9%
24.2%
image
---------------------------vmlinux-2.6.9-34.7hp.XCsmp
libm-2.3.4.so
myApp
Using HPCPI Labels
cum%
-----48.4%
73.3%
97.5%
Samples
--------132817287
:
:
If you run the same hpcpiprof command and specify the label name (hpcpiprof -label
myLabel), hpcpiprof displays event counts for code executed in all images for myApp, such
as code in shared libraries called from myApp. An extract of the output is as follows:
% hpcpiprof -label myLabel
Event Name
---------CPU_CYCLES
Events
------------3914574240000
Period
-----60000
CPU_CYCLES
---------198708e7
192510e7
192e7
:
:
%
----50.8%
49.2%
0.0%
image
-------------------------libm-2.3.4.so
myApp
vmlinux-2.6.9-34.7hp.XCsmp
cum%
-----50.8%
99.9%
100.0%
Samples
-------65242904
Using the label shows that of the CPU cycles used for myApp, the majority of the CPU cycles
were used in the libm-2.3.4.so library. To display which procedures in the libm-2.3.4.so
library were used by myApp and the number of CPU cycles used in each procedure, enter the
following command:
% hpcpiprof -label myLabel libm-2.3.4.so
The output is as follows:
libm-2.3.4.so: not found.
+ Found and using /lib/tls/libm-2.3.4.so
------Event Name
Events
Period
Samples
---------- ------------- ------ -------CPU_CYCLES 1987076640000
60000 33117944
CPU_CYCLES
---------100456e7
98252e7
%
----50.6%
49.4%
cum%
-----50.6%
100.0%
procedure
--------------__ieee754_sqrtf
__logf
image
------------libm-2.3.4.so
libm-2.3.4.so
The hpcpiprof utility prints the statement libm-2.3.4.so: not found because it first
searches for the libm-2.3.4.so image in the current working directory and fails. It finds and
uses the libm-2.3.4.so image in the path /lib/tls/libm-2.3.4.so, as indicated in the
subsequent statement.
For descriptions of the output data, see “Viewing Per-Procedure Data: hpcpiprof image_name”
(page 46) .
Step 5: Stopping the HPCPI Daemon
When you are done collecting data, enter the following command to stop the HPCPI daemon:
% hpcpictl quit
Simple HPCPI Session Using Labels
61
Label Selectors
Using the hpcpictl label command in its simplest form is sufficient if you are executing and
monitoring a single process that is executed directly from a run string. To monitor groups of
processes or processes that are started indirectly, you can specify label selectors.
When you specify selectors, HPCPI associates data from all processes that match the selectors
with the label, independent of the process launched by the command in the run string. Using
selectors enables you to do the following:
• Select data for the label from multiple processes or processes other than the process hpcpictl
starts.
• Have hpcpictl execute a utility that controls the duration of the label.
For example, you can specify selectors and a command that has a fixed runtime, such as the
sleep command. Another method is to specify selectors and a command that runs until
stopped by user action, such as the cat command with no input. The cat process uses the
standard input device (stdin) as the input, and continues to run until you close stdin or
explicitly terminate the cat process.
The hpcpictl label command syntax with selectors is as follows:
hpcpictl label label_name selectors command [arg...]
The selectors can be one or more of the following:
-pid process_id
-pid this
-ppid parent_process_id
-ppid this
-pgid process_group_id
-pgid this
-sid session_id
-sid this
-uid user_id
-uid this
-cpu cpu_number
Selects events for the specified process ID (PID).
Selects events for the process launched for the specified
command.
Selects events for all processes with the specified parent PID.
Selects events for all processes with a parent PID that is the
PID of the process launched for the specified command.
Selects events for all processes with the specified process
group ID.
Selects events for all processes with the same process group
ID as the process group ID of the process launched for the
specified command.
Selects events for all processes with the specified session ID.
Selects events for all processes with the same session ID as
the session ID of the process launched for the specified
command.
Selects events for all processes with the specified user ID.
The uid selector is useful when using a utility such as
mpirun that creates a new session and process group outside
the process tree of the processes it launches. For information
about using the uid selector in a cluster environment, see
“Starting a Distribution Utility from hpcpictl label”
(page 73) .
Selects events for all processes with the same user ID as the
user ID for the process launched for the specified command.
Selects all events for the specified CPU number.
Selector Operators
You can specify the following operators with selectors:
• The unary postfix operator -not
• The following binary postfix operators:
-and
62
Using HPCPI Labels
-or
-equiv
-not Operator
The unary postfix operator -not negates the specification. The following example uses the -not
operator to select events for nonsuperuser processes:
% hpcpictl label nonsuper -uid 0 -not sleep 30
This selects systemwide events for nonsuperuser processes (processes that do not have UID 0)
for 30 seconds (the runtime for the sleep 30 process) and associates them with the label
nonroot.
-and Operator
The binary postfix operator -and selects processes for which both selectors are true.
-or Operator
The binary postfix operator -or selects processes if either selector is true.
-equiv Operator
The binary postfix operator -equiv selects processes that have the same boolean value for the
selector.
You can follow the -equiv operator with a -not operator to perform an XOR operation
(selector_1 selector_2 -equiv -not).
NOTE: HP would like to know if you use the -equiv and -equiv -not operators. If you do,
contact HP using the email address in the HPCPI and Xtools Release Notes.
Operator Syntax
Specify the operators and operands using Reverse Polish Notation (RPN), or postfix notation. A
binary postfix operator follows the two operands (selectors).
For example, the following command selects data for processes with PID 51 or 52 for 30 seconds
(the runtime for the sleep 30 process). HPCPI does not include data from the sleep process
with the label:
% hpcpictl label two -pid 51 -pid 52 -or sleep 30
To select data from three processes by adding data from the process with PID 53, enter the
following command:
% hpcpictl label three -pid 51 -pid 52 -pid 53 -or -or sleep 30
Operands and operators are stacked. For example, the user enters the following command:
% hpcpictl label comps -pgid this -pid this -not -and make all
The hpcpictl utility processes the selectors in the following order:
(-pgid this (-pid this -not) -and)
This causes hpcpictl to select data for all processes spawned by make (processes in the same
process group, as specified by -pgid this), but not data for the make process itself (-pid
this -not).
Label Selectors
63
Multiple Labels
An event can be recorded in only one data set, that is, one label. If you have multiple labels
defined and a process matches the selectors for more than one label, the events for that process
are recorded in only one data set, and that data set is indeterminate.
Reusing Labels
You can specify the same label name in multiple hpcpictl commands. Each instance of the
hpcpictl label command creates a separate data set, and HPCPI aggregates all data sets
with the same specified label name and within the same epoch. For example, the user enters the
following commands:
% hpcpictl label mylabel myApp one
% hpcpictl label mylabel myApp two
% hpcpictl label mylabel myOtherApp bbb
You can specify the label mylabel as an option (-label mylabel) for HPCPI analysis tools
to select the performance data for all three processes.
The maximum number of concurrent data sets is 16.
Comparing Epochs and Labels
An epoch is a time-based boundary in the HPCPI database. An epoch does not isolate data
according to process ID or any other process characteristics.
The following is an example of a user session using epochs to isolate performance data for code
executed from libc-2.3.4.so by myApp:
%
%
%
%
hpcpictl epoch
myApp
hpcpictl epoch
hpcpiprof -epoch latest-1 libc-2.3.4.so
One disadvantage of using epochs only is that you cannot isolate events from shared images
executed from the user application. In this session, the command hpcpiprof -epoch latest-1
libc-2.3.4.so reports statistics for all code executed from the libc-2.3.4.so library during
the execution of myApp, which may include data for libc-2.3.4.so code executed by processes
other than the myApp process.
By comparison, the following is an example of a user session using labels to isolate performance
data for myApp:
% hpcpictl label myLabel myApp
% hpcpictl flush
% hpcpiprof -label myLabel libc-2.3.4.so
The command hpcpiprof -label myLabel libc-2.3.4.so reports statistics only for
libc-2.3.4.so code executed by the myApp process, unlike the hpcpiprof -epoch
latest-1 libc-2.3.4.so command in the previous example.
Using Epochs with Labels
You can use epochs with labels. If you have a label established and start another epoch, the labels
are retained across epochs (the label remains active until the command executed by the hpcpictl
utility completes).
NOTE: HP would like to know if you use labels and epochs concurrently to partition data. If
you do, contact HP using the email address in the HPCPI and Xtools Release Notes.
64
Using HPCPI Labels
Label Examples
This section contains HPCPI label examples.
Existing Processes: -pid pid
You can use the ps utility to determine the PID of an existing process and use the -pid pid
selector to attach a label to performance data for that process. In the following example, you
want to collect data for the process with PID 5515 and use the sleep 99999 command to keep
the label active:
% hpcpictl label myLabel -pid 5515 sleep 99999
Using Labels with Application Arguments
You can run a program with different argument sets and assign unique labels for each execution.
This enables you to compare performance data for different argument sets. In the following
example, you first run HPCPI to execute the program myApp with the argument -normal and
assign the HPCPI label norm_run to the event data generated. The second command uses
HPCPI to execute myApp with the argument -optimized and assign the HPCPI label opt_run
to the event data.
% hpcpictl label norm_run myApp -normal
% hpcpictl label opt_run myApp -optimized
Using the Application Argument in the Label Name
The following shell script creates a new label for each run of an application. It uses the same
script variable for part of the label name and as the argument passed to the application.
foreach n ($sizes)
hpcpictl label size_$n myApp $n
end
Utilities that Spawn Processes: -pgid this
The make and runspec utilities typically spawn multiple processes. The following examples
use the selector -pgid this to associate all spawned processes with the specified labels:
% hpcpictl label make_all -pgid this make all
% hpcpictl label all_runspec -pgid this runspec
If a spawned process explicitly changes its process group, its events will no longer be associated
with the label, but there can be a delay between the time its process group changes and when
hpcpid detects the change.
Spawned Processes without the Originator: -pgid this -pid this -not
To select data from the processes make spawns (the compilers) but exclude data from the make
process itself, enter the following command:
% hpcpictl label comp_label -pgid this -pid this -not -and make all
All Processes: -pid -1 -not
The selector -pid -1 -not specifies all processes. The -pid -1 specifies a process with PID
-1; but no process meets this specification. The -not operator negates this and selects all processes.
You can use this selector with the sleep utility to measure all processes on a system for a fixed
time period. For example:
% hpcpictl label all30 -pid -1 -not sleep 30
This command captures data for all processes on the system for 30 seconds (the duration of the
sleep 30 command).
Label Examples
65
Alternatively, you can use the sleep 99999 command and manually terminate the sleep
process when you are done taking measurements. For example:
% hpcpictl label all -pid -1 -not sleep 99999
You can also use this selector with the srun utility in a cluster environment to capture data for
all processes on the local system for the duration of the srun execution. This would include the
daemon started by srun and any user processes that srun launches on the local system. It does
not include data for processes that are launched on remote systems because labels do not cross
machine boundaries. For example:
% hpcpictl label all_srun -pid -1 -not srun myMPI
Kernel Idle Data: -pid 0
On Linux systems, PID 0 is the kernel idle process. The following command selects data for the
idle kernel for 20 seconds (the duration of the sleep 20 command) and associates it with the
label idle20:
% hpcpictl label idle20 -pid 0 sleep 20
Kernel Idle Data Per CPU
The following commands enable you to distinguish kernel idle data for CPU 0 from idle kernel
data for other CPUs:
% hpcpictl label idleCPU0 -pid 0 -cpu 0 -and sleep 20 &
% hpcpictl label idleCPUn -pid 0 -cpu 0 -not -and sleep 20 &
% wait
You can also chain hpcpictl label commands by executing an hpcpictl label command
from another hpcpictl label command. The following command is equivalent to the three
commands in the previous example:
% hpcpictl label idleCPU0 -pid 0 -cpu 0 -and \
hpcpictl label idleCPUn -pid 0 -cpu 0 -not -and \
sleep 20
66
Using HPCPI Labels
Creating Labels in Programs
You can use a function such as popen() to invoke the hpcpictl label command within an
application and assign a label to specific code areas in the application.
For example, you can profile the execution phase of an application only, without the initialization,
reporting, or finalization phases. This is analogous to benchmarks, which typically report results
for only the execution phase. In addition, you can use different labels to distinguish major phases
of an application, such as initialization and multiple subphases of execution.
Invoke the hpcpictl label command from within the application to assign labels to correspond
to the different application phases. One way to invoke hpcpictl label is to write a C routine
that is similar to the following:
sprintf(command, "/opt/hpcpi/bin/hpcpictl label %s -pgid this /bin/cat",
labelName);
labeler = popen(command, "w");
sleep(1);
In this example, hpcpictl starts a cat process, which is used only to keep the HPCPI label
active.
The sleep() call introduces a delay in the calling process that helps HPCPI partition events
you want to select for the label.
In addition, this example specifies -pgid this as the label selector, which selects all executables
with the same process group ID as the cat process, including the calling program. Alternatively,
you can use specify -pid %d with the return value from getpid() to select only the calling
program.
An application can also use environment variable values to enable or disable the use of HPCPI
labels, and to provide the label name or a portion of the label name.
C Code Example
The following do_hpcpi_label() function was inserted in a C program to control HPCPI
labels. The program was invoked from a shell script as follows:
foreach n ($sizes)
env HPCPI_LABEL=n$n a.out $n
end
The C code is as follows:
#include <stdio.h>
#include <stdlib.h>
:
:
static void
do_hpcpi_label(int start)
{
static FILE * labeler = NULL;
if (labeler) {
pclose(labeler);
labeler = NULL;
sleep(1);
}
if (start) {
const char * label = getenv("HPCPI_LABEL");
if (label) {
char command[1024];
const char * hpcpictl = "/opt/hpcpi/bin/hpcpictl";
sprintf(command,"%s label %s -pid %d /bin/cat",
hpcpictl,label,getpid());
labeler = popen(command, "w");
if (labeler) {
sleep(1);
Creating Labels in Programs
67
} else {
perror("popen()");
}
}
}
}
Notes
Note the following items:
•
•
•
•
•
The first if block terminates an existing label process. This block provides a locking
mechanism and is included for applications that use multiple phases or start and stop labels
multiple times during execution.
You can construct the label name using environment variable values, numeric function
arguments (such as problem size or phase number), or text function arguments (such as
data set name or phase name).
In this example, the HPCPI_LABEL environment variable controls the use of HPCPI labels
in the application. If the HPCPI_LABEL environment variable is not set, no HPCPI label is
started. The HPCPI_LABEL environment variable also provides the label name.
The do_hpcpi_label() function only stops or starts the HPCPI label. The action (start or
stop) is controlled by the value of the start argument passed to the function. This enables
you to declare the labeler as a function-static variable.
Placing the code that starts hpcpictl label and related code in a separate function
do_hpcpi_label()) makes the invoking code cleaner.
In the application, a start call (do_hpcpi_label(1)) is inserted in the main code just prior
to the timing run, and a stop call (do_hpcpi_label(0)) just after the timing run.
Fortran Code Example
The following is a C-language Fortran-callable routine to label the phases of an application. It
uses a traditional convention for name manipulation and executing C code from Fortran; you
might need to adjust it for some compiler environments.
#include <stdio.h>
#include <stdlib.h>
:
:
void
notephase_(char name[], int namelen)
{
static int manage_labels = -1;
static FILE * labeler = NULL;
if (labeler) {
pclose(labeler);
labeler = NULL;
}
if (manage_labels < 0)
manage_labels = (getenv("MANAGE_HPCPI_LABELS") != NULL);
if (manage_labels && name && (namelen > 0)) {
char command[1024];
const char * hpcpictl = "/opt/hpcpi/bin/hpcpictl";
sprintf(command, "%s label %.*s -pid %d /bin/cat",
hpcpictl, namelen, name, getpid());
labeler = popen(command, "w");
sleep(1);
}
}
68
Using HPCPI Labels
6 Using HPCPI on an HP XC Cluster
This chapter describes additional procedures for using HPCPI on an HP XC cluster. This chapter
addresses the following topics:
• “Overview” (page 69)
• “Collecting Data on Multiple Nodes” (page 70)
• “Collecting Data on One Node” (page 73)
Overview
When using HPCPI on an HP XC cluster you can do the following:
•
•
Collect performance data from some or all nodes in the job allocation.
Collect performance data from one node in the job allocation.
To collect performance data from all nodes in the job allocation, the hpcpid daemon must run
on all nodes in the job allocation. In addition, you can consolidate the data into a single database
and synchronize the epochs.
If you collect performance data from only one node in the job allocation, the hpcpid daemon
only needs to run on this node. If you are using HPCPI labels, you must execute the hpcpictl
label command only on the node on which hpcpid is running, and use label selectors to
associate the label with data from the application instead of the mpirun or other distribution
utility process, as described in “Starting a Distribution Utility from hpcpictl label” (page 73).
Using Labels with mpirun and Other Distribution Utilities
Many distribution utilities (utilities that start programs on multiple nodes, such as mpirun)
create a new session and process group for the processes it launches, including processes it
launches on the local system.
There are two ways to establish HPCPI labels with distribution utilities:
•
Execute the hpcpictl label command from the distribution utility as follows:
% mpirun ... hpcpictl label myLabel... myApp
Use this method to collect performance data from all nodes in the job allocation. You must
start the hpcpid daemon on all nodes in the job allocation if it is not already running, and
in most cases you will synchronize the database and epoch. For more information, see
“Collecting Data on Multiple Nodes” (page 70).
•
Execute the distribution utility from the hpcpictl label command as follows:
% hpcpictl label myLabel... mpirun... myApp
Use this method to collect performance data from only one node in the job allocation. The
hpcpid daemon needs to run only on the node where you are collecting data; it does not
have to run on the other nodes in the cluster. For more information, see “Collecting Data
on One Node” (page 73).
Overview
69
Collecting Data on Multiple Nodes
This section describes the tasks you must perform to collect data on multiple nodes, and includes
an example using HP-LSF, SLURM, and MPI.
Consolidating and Synchronizing Data
If you are collecting performance data from all nodes in your job allocation, you can consolidate
the HPCPI data in one database and in one epoch. By default, each hpcpid daemon starts a new
epoch. To consolidate and synchronize the data, follow these steps:
1.
2.
3.
Select and create a directory for the HPCPI database that is shared by all nodes in the cluster.
You must also have write permission for the directory (the HPCPI daemon uses your user
ID when writing data to the database). For additional requirements, see “Selecting a Location
for the HPCPI Database Directory” (page 36).
Set the HPCPIDB environment variable to the selected database, or specify the database
using the -db option in the hpcpid and HPCPI utilities.
On one node (such as the current login node), create a new epoch by entering the following
command:
% hpcpid -create-epoch
This command also terminates the hpcpid daemon after creating the epoch. This enables
you to start the daemon using the same command on all nodes in the following step.
4.
Start the daemon on the nodes you want to monitor so that each daemon uses the existing
epoch (the epoch you created in step 3) with the following command:
% hpcpid -epoch
You can use this command in a job file that is executed on all nodes, as shown in “prolog
File” (page 71).
Selecting Output Data for Specific Systems
By default, the hpcpiprof, hpcpilist, and hpcpitopcounts utilities search for profile files
in all system subdirectories in the epoch. In a cluster environment with a consolidated HPCPI
database and synchronized epochs, the utilities find profile files for multiple systems, and display
aggregate data values. This is useful for analyzing performance data for the cluster as a single
entity.
To view data from individual nodes, use the -hosts option with the hpcpiprof, hpcpilist,
or hpcpitopcounts utility, as described in “Selecting Data by System” (page 51).
Example Using HP-LSF, SLURM, and MPI
This section contains an example with commands and files used to run an MPI application using
HP-LSF and SLURM, and to collect HPCPI data from all nodes in the job allocation.
NOTE: SLURM syntax and operation are subject to change. The contents of this example are
provided only as guidelines.
RMS users can use RMS commands and mechanisms to start and control HPCPI.
Creating the Common HPCPI Directory and Epoch
On one system, create the HPCPI directory. The directory must be accessible by all nodes in the
cluster. Then create an epoch. For example:
% setenv HPCPIDB ~/hpcpidb
% mkdir -p $HPCPIDB
% hpcpid -create-epoch
70
Using HPCPI on an HP XC Cluster
Submitting the Job
Use the HP-LSF bsub command to submit the following job:
% bsub -n num_nodes \
mpirun -srun \
--task-prolog=`pwd`/slurm.task-prolog.hpcpi \
--task-epilog=`pwd`/slurm.task-epilog.hpcpi \
myApp myArgs
The num_nodes is the number of nodes for the job, myApp is the name of the MPI application,
and myArgs are any arguments for the MPI application.
The prolog file is slurm.task-prolog.hpcpi and the epilog file is
slurm.task-epilog.hpcpi. Both files are located in the current working directory, and
prefacing the file names with `pwd`/ ensures that SLURM can locate the files.
The prolog file starts the hpcpid daemon on all nodes in the job allocation.
To use an HPCPI label, run the hpcpictl label command from the mpirun utility. Replace
the myApp myArgs run string in the example with an hpcpictl label command that launches
myApp as follows:
hpcpictl label myLabel [label_selectors] myApp myArgs
prolog File
The contents of the prolog file (slurm.task-prolog.hpcpi) are as follows:
#!/bin/csh -f
if ( ! $?SLURM_LOCALID ) then
exit
endif
if ( ! $?SLURM_TASK_PID ) then
exit
endif
# Only start the HPCPI daemon from one task per node.
#
if ( $SLURM_LOCALID == 0 ) then
#
# Start hpcpid with the -terminate-with option to ensure that
# hpcpid terminates when the SLURM job finishes, in case
# the epilog doesn't run or some other catastrophe.
#
# We want hpcpid to terminate with the task. $SLURM_TASK_PID is the pid of
# the slurmstepd. Its parent, the one we want to terminate with,
# is the initial slurmstepd on this node for this task.
#
# -epoch uses the current epoch, so each node will use the
# previously created epoch.
# The >& redirection of the output is useful for logging
# and debugging, but is also used because SLURM
# expects script output in the form VAR=VAL pairs.
#
set termWithPID=`ps --no-heading --format ppid -p $SLURM_TASK_PID`
hpcpid -terminate-with $termWithPID -epoch >& $HPCPIDB/task-prolog.`hostname`.$$
endif
# Each task should wait until the HPCPI daemon is up
#
foreach try (1 2 3 4 5 6 7 8 9 10)
sleep 1
hpcpictl show >& /dev/null
if ( $status == 0 ) then
exit
endif
In normal operation, the daemon is terminated by code in the epilog script. The prolog script
starts hpcpid with the option -terminate-with pid as a contingency method to terminate
Collecting Data on Multiple Nodes
71
the daemon when the specified PID process terminates. In this case, pid is the PID of the initial
slurmstepd on node for this task.
By default, the -terminate-with option does not flush HPCPI data to disk before terminating
the daemon. You can specify the -doflush option with the -terminate-with option to flush
the data before terminating the daemon. In this example, the -terminate-with option is used
only as a contingency method to kill the daemon, and the -doflush is not specified based on
the assumption that the -terminate-with option takes effect only if something fails in the
job.
The script specifies the hpcpid option -epoch to use the current epoch instead of starting a
new epoch.
The script writes the hpcpid startup information to the log file
task-prolog.hostname.shell_pid in the HPCPI database. You can use this information
to analyze possible problems.
epilog File
The contents of the epilog file slurm.task-epilog.hpcpi are as follows:
#!/bin/csh -f
if ( ! $?SLURM_LOCALID ) then
exit
endif
# Only quit from one task per node.
#
if ( $SLURM_LOCALID == 0 ) then
hpcpictl quit >>& $HPCPIDB/task-epilog.`hostname`.$$
endif
end
72
Using HPCPI on an HP XC Cluster
Collecting Data on One Node
To collect data on one node in a cluster environment, you can use the procedures described in
Chapter 3 (page 31) and Chapter 4 (page 35) with the following additional guidelines:
•
•
•
•
Do not start the hpcpid daemon on each node.
Do not start the daemon from a script that is executed on every node.
If you are using the hpcpictl label command, execute the distribution utility (such as
mpirun) from the hpcpictl label command (hpcpictl label...mpirun...).
Do not execute the hpcpictl label command from the distribution utility
(mpirun...hpcpictl label...).
Starting a Distribution Utility from hpcpictl label
When you collect performance data from only one node in the job allocation and use an HPCPI
label, start the distribution utility from the hpcpictl label command. HPCPI establishes the
label only on one node, and the hpcpid daemon needs to be running on that node only.
If you start the distribution utility from the hpcpictl label command and the utility creates
a new process group for the application it launches, you cannot use the -pid this or -pgid
this selectors to capture data from the application. Instead, you can use the -uid option to
associate the label with events from processes running with your UID. For example:
% hpcpictl label mylabel -uid this mpirun [mpi_opts] myApp
The data associated with the label includes the overhead data for the distribution utilities (the
mpirun and scheduler processes) that run with your UID, but this data is trivial when compared
to the data for most applications.
Collecting Data on One Node
73
74
7 Using Xtools
This chapter describes how to use xclus, xcxclus, xperf, and xcxperf. This chapter addresses
the following topics:
• “Xtools Overview” (page 76)
• “Using xclus and xcxclus” (page 76)
• “Starting xclus and xcxclus” (page 78)
• “Viewing xclus and xcxclus Displays” (page 81)
— “Viewing xclus (Enhanced) Itanium Icons” (page 82)
— “Viewing xclus (Enhanced) Single-Core and Dual-Core AMD Opteron Node Icons”
(page 83)
— “Viewing xclus(Enhanced) Native Quad-Core AMD Opteron Node Icons” (page 84)
— “Viewing xcxclus (Generic) Node Icons” (page 85)
— “Showing Statistic Names and Descriptions” (page 86)
— “Showing Bandwidth or Utilization Rates” (page 86)
— “Showing HyperTransport Data Statistics or Data and Control Statistics” (page 86)
— “Changing the Refresh Rate” (page 86)
— “Hiding Statistic Values” (page 86)
— “Suspending the Display” (page 86)
— “Modifying the Display Size and Layout” (page 86)
— “Using Enhanced (xclus) Menu Options” (page 87)
— “Using Generic (xcxclus) Menu Options” (page 87)
•
•
•
•
•
•
“Recording, Replaying, and Plotting xclus and xcxclus Data” (page 89)
“Starting xperf or xcxperf from xclus or xcxclus” (page 92)
“Viewing Grouped Nodes” (page 92)
“Using xperf and xcxperf” (page 94)
“Starting xperf and xcxperf” (page 94)
“Viewing xperf and xcxperf Displays” (page 95)
— “Viewing Itanium xperf (Enhanced) Statistics” (page 96)
— “Viewing AMD Opteron xperf (Enhanced) Statistics” (page 98)
— “Viewing xcxperf (Generic) Statistics” (page 101)
— “Displaying Color Legends and Creating Tear-Away Legends” (page 104)
— “Hiding or Showing Graphs” (page 104)
— “Showing I/O Bandwidth or Utilization Rates” (page 104)
— “Showing Cycles Per Instruction or Instructions Per Cycle” (page 104)
— “Modifying Graph Colors and Line Widths” (page 104)
— “Using xperf (Enhanced) Menu Options” (page 104)
— “Using xcxperf (Generic) Menu Options” (page 105)
•
•
•
•
“Starting an HPCPI Label from xperf” (page 106)
“Viewing Generic Data with xclus or xperf” (page 109)
“Viewing Enhanced Data with xcxclus or xcxperf” (page 110)
“Xtools Daemons” (page 111)
75
Xtools Overview
The Xtools utilities are X11 clients with GUIs that enable you to monitor the performance of
multiple systems and individual systems. The Xtools bundle consists of the following utilities:
• xclus and xcxclus
The xclus and xcxclus utilities enable you to monitor performance and resource utilization
for multiple systems or nodes in a cluster. By default, xclus displays processor-specific
statistics (enhanced statistics) for the processors listed in Table 1-1 (page 20), and xcxclus
displays processor-independent statistics (generic statistics) for the processors listed in
Table 1-2 (page 20).
Table 1-3 (page 21) lists the statistics that xclus and xcxclus display.
On cluster systems, the xclus and xcxclus utilities monitor only the nodes in your job
allocation by default. When monitoring nodes in your job allocation, the utilities verify that
the nodes in its display are in your job allocation. When the utilities detect that a node is no
longer in your job allocation, they stop displaying information about that node. When a
utility detects that you have no nodes in your job allocation, it terminates.
•
xperf and xcxperf
The xperf and xcxperf utilities enable you to monitor performance and resource utilization
for individual systems. By default, xperf displays enhanced statistics for the processors
listed in Table 1-1 (page 20), and xcxperf displays generic statistics for the processors
listed in Table 1-2 (page 20).
Table 1-4 (page 23) lists a summary of the statistics that xperf and xcxperf display. For
detailed lists of xperf and xcxperf statistics, see “Viewing Itanium xperf (Enhanced)
Statistics” (page 96), “Viewing AMD Opteron xperf (Enhanced) Statistics” (page 98), and
“Viewing xcxperf (Generic) Statistics” (page 101).
The Xtools also include the following daemons:
apmond
clusmond
The Xtools also use the following HP XC daemons:
mond
supermon
For more information about these daemons, see “Xtools Daemons” (page 111).
Using xclus and xcxclus
The following sections describe general procedures for using xclus and xcxclus. The xclus
and xcxclus utilities are very similar, and the procedures for using them are the same, except
for the following differences:
•
The xclus utility displays processor-specific (enhanced) data by default. The xcxclus
utility displays processor-independent (generic) data by default. Table 1-3 (page 21) lists
the information displayed by each utility.
The sections “Viewing Generic Data with xclus or xperf” (page 109) and “Viewing
Enhanced Data with xcxclus or xcxperf” (page 110) describe how to display alternate
data sets.
•
76
The xclus and xcxclus utilities support the -unrestricted-nodes option that enables
you to specify and monitor nodes outside your job allocation. The xcxclus utility requires
superuser privileges to use the -unrestricted-nodes option. The xclus utility does
Using Xtools
•
not require superuser privileges to use the -unrestricted-nodes option and supports
the -unrestricted-nodes option for all users.
On non-cluster systems, you must specify the nodes you want xclus to monitor. You do
not need to specify the -unrestricted-nodes option when running xclus on non-cluster
systems.
Using xclus and xcxclus
77
Starting xclus and xcxclus
To start xclus or xcxclus, follow these steps:
1.
2.
3.
Set up the Xtools environment.
Set the DISPLAY environment variable.
Start the xclus or xcxclus program. If you are using xclus and are running on a
non-cluster system or do not have a job allocation, you must specify the nodes you want to
monitor.
Step 1: Setting Up the Xtools Environment
On systems with the modules utility, enter the following command to set up the Xtools
environment:
% module load xtools
Alternatively, you can manually add the Xtools binary directory (/opt/xtools/bin) to your
PATH environment variable, and /opt/xtools/man to your MANPATH. For example:
% setenv PATH /opt/xtools/bin:$PATH
% setenv MANPATH /opt/xtools/man:$MANPATH
Step 2: Setting the DISPLAY Environment Variable
The xclus and xcxclus utilities are X11 clients. You must set the DISPLAY environment variable
so the utility can open the appropriate X11 display device. For example:
% setenv DISPLAY myhost:0.0
Step 3: Starting xclus or xcxclus
This section describes how to start xclus or xcxclus.
Starting xclus
On HP XC cluster systems, you can start xclus by entering the following command:
% xclus
By default, xclus determines the nodes in your job allocation when it starts and monitors those
nodes. If a node leaves your job allocation, xclus stops monitoring that node. If you are not
running any jobs, the xclus utility terminates.
Specifying Nodes with xclus
On non-cluster systems, you must specify the nodes you want xclus to monitor. On HP XC
cluster systems, you can specify alternate nodes for xclus to monitor, such as a subset of the
nodes in your job allocation. See “Specifying Nodes for xclus or xcxclus” (page 79) for
information about specifying nodes for xclus.
Starting xcxclus
To start xcxclus, enter the following command:
% xcxclus
By default, xcxclus determines the nodes in your job allocation when it starts and monitors
those nodes. If a node leaves your job allocation, xcxclus stops monitoring that node. If you
are not running any jobs, the xcxclus utility terminates.
78
Using Xtools
Specifying Nodes with xcxclus
By default, you do not need to specify the nodes you want to monitor with xcxclus, and
xcxclus monitors all the nodes that are in your job allocation when it starts. However, you can
specify nodes with xcxclus to:
•
•
Monitor a subset of nodes in your job allocation.
Monitor nodes outside of your job allocation. You specify the option -unrestricted-nodes
and must have superuser privileges to do this.
See “Specifying Nodes for xclus or xcxclus” (page 79) for information about specifying nodes
for xcxclus.
Specifying Nodes for xclus or xcxclus
On HP XC cluster systems, you do not have to specify the nodes you want to monitor and xclus
or xcxclus monitors the nodes in your job allocation. However, specifying nodes is useful in
the following cases:
•
•
•
You are running xclus on a non-cluster system. In this case, you must specify the nodes
you want to monitor.
You want to monitor a subset of the nodes in your job allocation.
You want to monitor nodes outside of your job allocation. If you are on an HP XC cluster
system, you must also specify the option -unrestricted-nodes. If you are using xcxclus,
you must have superuser capability to specify this option.
Use one of the following methods to specify nodes:
•
•
•
Specify the names of the nodes you want to monitor using the -nodes option.
Create a cluster file and specifying the name of the cluster file name using the -cluster
option.
Create a cluster file in the current working directory with the name cluster. This method
is successful only if you are running xclus on a standalone system.
Specifying Nodes with -nodes
To specify node names in the run string, use the following syntax:
xclus|xcxclus -nodes node_specification[,node_specification...]
Where:
node_specification
number_specs
Is a node name, or a node name prefix followed by
[number_specs].
Is a comma-separated list of ranges, and a range is a singleton or a
dash-separated pair. If you use the [number_specs] form, use
the appropriate quotation symbols for your shell.
For example:
% xclus -nodes node1,node2,node3,node4
The following commands are equivalent to each other:
% xclus -nodes 'c1,c3,c5,c6,c7,c9,c12'
% xclus -nodes 'c1,c[3,5-7,9],c12'
% xclus -nodes 'c[1,3,5-7,9,12]'
Creating a Cluster File
A cluster file contains node names, with one node name per line. For example, to monitor node1,
node2, node3, and node4, create the file cluster that contains the following text:
node1
node2
node3
node4
Starting xclus and xcxclus
79
Specifying the Cluster File Name with the -cluster Option
If the cluster file is not named cluster and is not located in the current working directory, you
must use the -cluster option to specify the name of the cluster file. For example, if the cluster
file name is my_cluster_file, start xclus by entering the following command:
% xclus -cluster my_cluster_file
If you are on an HP XC cluster and the nodes in the cluster file are not in your job allocation, you
must also specify the -unrestricted-nodes option as follows:
% xclus -cluster my_cluster_file -unrestricted-nodes
If you are using xcxclus, you must have superuser capability to specify the
-unrestricted-nodes option.
Using the xclus Default Cluster File Location
If you are running xclus on a standalone system and do not specify the -nodes or -cluster
option, xclus searches the current working directory for a cluster file named cluster.
80
Using Xtools
Viewing xclus and xcxclus Displays
Figure 7-1 shows an xclus display for four Itanium systems. To view an xclus display with
AMD Opteron systems, see Figure 1-1 (page 21).
Figure 7-1 xclus Display for Itanium Systems
Each icon shows data for one node. By default, xclus or xcxclus displays one icon for each
node if it is monitoring fewer than 64 nodes; otherwise each icon represents a group of nodes
with similar performance statistics, as described in “Viewing Grouped Nodes” (page 92).
Each node icon has rectangle and arrow subareas for different system statistics. Most statistics
are utilization statistics, shown as a percentage of the total resource utilization. The subareas are
color coded according to the value of the current statistic, and are updated once a second by
default. The key for the color coding is at the bottom of the window.
By default, xclus displays enhanced (processor-dependent) information and the icons and
statistics shown in each subarea vary according to the type of processor monitored. By default,
xcxclus displays generic (processor-independent) information that is the same for all node
types. The statistics for the different node displays are described in “Viewing xclus (Enhanced)
Itanium Icons” (page 82), “Viewing xclus (Enhanced) Single-Core and Dual-Core AMD Opteron
Node Icons” (page 83), “Viewing xclus(Enhanced) Native Quad-Core AMD Opteron Node
Icons” (page 84), and “Viewing xcxclus (Generic) Node Icons” (page 85).
The following subsections describe how to perform the following tasks:
• “Viewing xclus (Enhanced) Itanium Icons” (page 82)
• “Viewing xclus (Enhanced) Single-Core and Dual-Core AMD Opteron Node Icons”
(page 83)
• “Viewing xclus(Enhanced) Native Quad-Core AMD Opteron Node Icons” (page 84)
• “Viewing xcxclus (Generic) Node Icons” (page 85)
• “Showing Statistic Names and Descriptions” (page 86)
• “Showing Bandwidth or Utilization Rates” (page 86)
• “Showing HyperTransport Data Statistics or Data and Control Statistics” (page 86)
• “Changing the Refresh Rate” (page 86)
• “Hiding Statistic Values” (page 86)
• “Suspending the Display” (page 86)
• “Modifying the Display Size and Layout” (page 86)
• “Using Enhanced (xclus) Menu Options” (page 87)
• “Using Generic (xcxclus) Menu Options” (page 87)
Viewing xclus and xcxclus Displays
81
Viewing xclus (Enhanced) Itanium Icons
By default, xclus displays enhanced icons for Itanium processors. Figure 7-2 shows an enhanced
icon for a node with two single-core Itanium processors.
Figure 7-2 Itanium xclus Display
3
1
2
4
5
6
7
1
2
3
Node designator.
Utilization rates for core 0
and 1.
FSB bus utilization rate.
4
The dual-headed arrows at
the bottom of this icon each
represent an I/O bus. In this
example, the left-most
arrow shows data for I/O
bus 0.
5
82
Using Xtools
MID bus utilization rate.
Inbound utilization rate for
this I/O bus.
6
7
Total utilization rate for this
I/O bus. This includes control
messages, so when displayed
as a transfer rate (Mb/s), the
total may be greater than the
sum of the inbound and
outbound transfer rates.
Outbound utilization rate for
this I/O bus.
Viewing xclus (Enhanced) Single-Core and Dual-Core AMD Opteron Node Icons
By default, xclus displays enhanced icons for AMD Opteron processors. Figure 7-3 shows an
enhanced icon for a node with four single-core AMD Opteron processors.
Figure 7-3 Four Single-Core AMD Opteron xclus Display
4
5
1
3
2
6
The four largest rectangles represent one processor each. Each processor has a smaller rectangle
attached to it representing local DRAM and arrows representing HyperTransport links.
1
2
Node designator.
Utilization rate for
processor 0
3
4
Utilization rate for
processor 0 DRAM.
Utilization rate for
processor 0 HyperTransport
link 2 to external devices.
5
6
Utilization rate for processor
0 HyperTransport link 0 (to
processor 1, the processor to
the right of processor 0)
Utilization rate for processor
0 HyperTransport link 1 (to
processor 3, the process
below processor 0)
Viewing xclus and xcxclus Displays
83
Viewing xclus(Enhanced) Native Quad-Core AMD Opteron Node Icons
By default, xclus displays enhanced icons for AMD Opteron processors. Figure 7-4 shows an
enhanced icon for a node with two native quad-core AMD Opteron processors.
Figure 7-4 Native Quad-Core AMD Opteron xclus Display
2
10
3
1
4
5
6
7
8
9
The cores are represented by two sets of four small rectangles. The processors are dual-railed,
so each processor has two HyperTransport buses to the other processor.
1
2
3
4
84
Node designator
CPU utilization rates for
cores 0 - 3 on processor
socket 0
CPU utilization rates for
cores 4 - 7 on processor
socket 1
Utilization rate for
processor 0 DRAM
Using Xtools
5
6
7
Utilization rate for
processor 1 DRAM
Utilization rate for
processor 0
HyperTransport link 0 to
processor 1
Utilization rate for
processor 0
HyperTransport link 1 to
processor 1
8
9
10
Utilization rate for
processor 1
HyperTransport link 0 to
processor 0
Utilization rate for
processor 1
HyperTransport link 1 to
processor 0
Utilization rate for
processor 0
HyperTransport link 2 to
external devices
Viewing xcxclus (Generic) Node Icons
By default, the xcxclus utility displays generic icons for all processor types, and the information
displayed is the same for all processor types. Figure 7-5 shows a generic icon for a node.
Figure 7-5 Generic Node xcxclus Display
1
2
3
4
5
6
1
2
Node designator
Utilization rates for core 0
and core 1. The xcxclus
utility displays a rectangle
with the utilization rate for
each processor.
3
Utilization rates for memory
The dual-headed arrows at
the bottom of this icon each
represent an I/O bus. In this
example, the left-most
arrow shows data for I/O
bus 0
4
Inbound utilization rate for
this I/O bus
5
6
Total utilization rate for this
I/O bus. This includes control
messages, so when displayed
as a transfer rate in Mb/s, the
total may be greater than the
sum of the inbound and
outbound transfer rates
Outbound utilization rate for
this I/O bus
Viewing xclus and xcxclus Displays
85
Showing Statistic Names and Descriptions
If you move your mouse over an icon area, xclus or xcxclus opens a window with the name
of the statistic and more information about the data. Figure 7-6 is an example of this display for
CPU 0 utilization on node onfire16.
Figure 7-6 CPU Description Window
Showing Bandwidth or Utilization Rates
By default, xclus and xcxclus show utilization rates for I/O devices. To change the display
to show bandwidth rates in Mb/, select the Options→bus_name→Bandwidth from the menu
at the top of the display, such as Options→IO Bus→Bandwidth, or Options→HT
Bandwidth→Bandwidth.
Showing HyperTransport Data Statistics or Data and Control Statistics
By default, xclus shows statistics for HyperTransport links when all (data and control) packets
are measured. To show link statistics for data packets only, select Options→HT All.vs.Data→Data
from the menu at the top of the display. Statistics for all packets are useful for determining
channel saturation. Statistics for data packets only are useful for confirming data rates.
Changing the Refresh Rate
By default, the utilities refresh data once per second. You can change the refresh rate by selecting
Options→Refresh from the menu at the top of the display.
You can also change the refresh rate by setting the *xclus.refreshRate X11 resource before
you start the utility. For more information, see EXTERNAL INFLUENCES in xclus(1) or xcxclus(1).
Hiding Statistic Values
To hide statistic values in the display, select View→Values from the menu at the top of the
display and deselect Values.
Suspending the Display
To suspend the display, select Hold from the menu at the top of the display.
Modifying the Display Size and Layout
You can modify the size of the xclus or xcxclus display. By default, the xclus or xcxclus
utility sizes its display to fill the screen. Modify the size of the display by doing one of the
following:
•
•
86
Selecting View→Zoom from the menu at the top of the display
Setting the *xclus.zoom X11 resource before starting xclus or xcxclus
Using Xtools
You can also modify the number of icons that xclus or xcxclus displays per row. By default,
the xclus or xcxclus utility attempts to display eight node icons per row. You can specify an
alternate value for the row width as follows:
•
•
Specifying the -row-width argument when you start xclus or xcxclus
Setting the X11 resource *xclus.nodesPerRow before starting xclus or xcxclus
If xclus or xcxclus cannot fit all the nodes in one screen without exceeding the screen height,
it increases the number of nodes per row, and overrides any configured row width value.
For more information about setting X11 resources for these utilities, see the EXTERNAL
INFLUENCES section of xclus(1) or xcxclus(1).
Using Enhanced (xclus) Menu Options
Table 7-1 describes the default menu options at the top of the xclus display window. The xclus
or xcxclus utilities support these menu options when displaying enhanced node information
(the default mode for xclus).
Table 7-1 xclus (Enhanced) Menu Options
Menu
Option
Description
File
Exit..
Stops the xclus utility.
Options
Group Control...
Opens a dialog box that enables you to control node grouping parameters.
This option is present only when node grouping is active. For more
information, see “Viewing Grouped Nodes” (page 92).
Refresh...
Opens a dialog box that enables you to set the refresh rate.
Modify Key...
Opens a dialog box that enables you to change the utilization-to-color
mapping for all components except CPU utilization.
CPU
Enables you to display core utilization in terms of user or system statistics,
or both. By default, xclus displays both.
Mid/Front Side Bus
Enables you to display the MID and FSB bus data by bandwidth (Mb/s) or
utilization. (Supported only on Itanium processors.)
IO Bus
Enables you to display the I/O bus data in terms of bandwidth (Mb/s) or
utilization. (Supported on Itanium processors only.)
HT Bandwidth
Enables you to display the HyperTransport data in terms of bandwidth
(Mb/s) or utilization. (Supported on AMD Opteron processors only.)
HT All.vs.Data
Enables you to display HyperTransport data for all events or for data events
only. (Supported on AMD Opteron processors only.)
Zoom
Enables you to show the xclus display at 50%, 75%, 100%, or 125% of its
size.
Key
Hides or displays the color key at the bottom of the display.
Node's
Enables you to display the full node name or only the node number.
Values
Hides or displays the utilization values in the node icons.
View
Hold
Enables you to suspend the display until this button is released.
Using Generic (xcxclus) Menu Options
Table 7-2 describes the default menu options at the top of the xcxclus display window. The
xcxclus or xclus utilities support these menu options when displaying generic node
information (the default mode for xcxclus).
Viewing xclus and xcxclus Displays
87
Table 7-2 xcxclus Generic) Menu Options
Menu
Option
Description
File
Exit..
Stops the xcxclus utility.
Options
Group Control... Opens a dialog box that enables you to control node grouping parameters.
This option is present only when node grouping is active. For more
information, see “Viewing Grouped Nodes” (page 92).
View
Hold
88
Using Xtools
Refresh...
Opens a dialog box that enables you to set the refresh rate.
Modify Key...
Opens a dialog box that enables you to change the utilization-to-color
mapping for all components except CPU utilization.
CPU
Enables you to display core utilization in terms of user or system
statistics, or both.
Sys. Memory
Enables you to display the data in terms of the memory used for the
application or the total memory used.
Network Util,
BW
Enables you to display the network (Ethernet and interconnect) data by
bandwidth (Mb/s) or utilization.
Zoom
Enables you to show the xcxclus display at 50%, 75%, 100%, or 125% of
its size.
Key
Hides or displays the color key at the bottom of the display.
Node's
Enables you to display the full node name or only the node number.
Values
Hides or displays the utilization values in the node icons.
Enables you to suspend the display until this button is released.
Recording, Replaying, and Plotting xclus and xcxclus Data
You can save the data from the xclus or xcxclus utility in a file. The utilities update data for
each monitored node every second. You can use this data file either to replay the data or to plot
graphs of node performance statistics.
Recording Data
To record data and create a data file, specify the -output option when starting the xclus or
xcxclus utility. The syntax is as follows:
xclus|xcxclus -output data_file_prefix
The xclus or xcxclus utility creates a data file with the name data_file_prefix.xclus.
To run the xcxclus utility and store the output in a file named test.xclus, enter the following
command:
% xcxclus -output test
Suppressing the Display (-quiet)
To create a data file without opening an xclus or xcxclus display window, specify the -quiet
option with the -output option. To terminate an xclus or xcxclus process without a display
window, you must manually send the process an interrupt (Control-c) or KILL signal.
Replaying Data
To play back the display from recorded data, specify the data file prefix when you invoke the
xclus or xcxclus utility. The following command line plays back the test.xclus data file:
% xclus test
The graphical display differs from normal xclus and xcxclus displays because there is an
additional pull-down menu named Control next to the File menu. Choosing the Play... option
from the Control menu opens a dialog box that you can use to control the playback.
Plotting Data
The -plot option uses gnuplot to display plotted data from an xclus or xcxclus data file
created using the -output option. The -plot option also creates a gnuplot data file and a
gnuplot script to redisplay the data. The syntax is as follows:
% xclus|xcxclus -plot plot_file_prefix data_file_prefix
Where:
plot_file_prefix
data_file_prefix
Specifies the prefix the utility uses to create the gnuplot data and
script files.
Specifies the file name prefix used to create the data file.
To plot data from the data file created using the command xclus -output test, enter the
following command:
% xclus -plot testplot test
The xclus utility displays the following prompt:
Do you wish to view the created plots now using the command
$ /opt/xtools/gnu/bin/gnuplot prefix.xclus.gnuplot
(y/[n])?
Enter y to display the plotted data for the first node. Each time you press Enter, the utility
displays the plotted data for the next node monitored. Continue pressing Enter until the utility
displays data for all the nodes.
The xclus or xcxclus utility creates the following files:
plot_file_prefix.xclus.plotdata Data file for gnuplot.
Recording, Replaying, and Plotting xclus and xcxclus Data
89
plot_file_prefix.xclus.gnuplot Script file for gnuplot.
You can redisplay the plotted data using the /opt/xtools/gnu/bin/gnuplot command
with the plot_file_prefix.xclus.gnuplot file name as its operand.
Figure 7-7 shows xclus plotted data.
90
Using Xtools
Figure 7-7 Plotted Data from xclus
Recording, Replaying, and Plotting xclus and xcxclus Data
91
Starting xperf or xcxperf from xclus or xcxclus
To start xperf from xclus or to start xcxperf from xcxclus, click a node icon.
Viewing Grouped Nodes
If you are monitoring a large number of nodes, xclus or xcxclus groups nodes with similar
performance profiles and displays a single icon for the group. The node designator for the icon
is in the format n(node_designators), where n is the number of nodes in the group and
node_designators is an abbreviated list of the node names or node numbers of the group
members. If you move your mouse over the icon, the utility opens a window with information
about the group, including a complete list of the members.
Figure 7-8 shows an xclus group icon that represents three nodes, with the information window
for the group.
Figure 7-8 xclus Group Icon
Viewing Individual Node Icons
To view icons for individual nodes in a group icon, click the group icon.
Controlling Group Displays
By default, xclus and xcxclus groups nodes if the number of nodes monitors exceeds the
group threshold number; the default group threshold number is 64 nodes. To change the group
threshold number, specify the -node-grouping-threshold when you start the utility.
To force the xclus or xcxclus utility to group nodes even if the number of monitored nodes
is less than the threshold, specify the -group-nodes option when you start the utility.
To force the xclus or xcxclus utility to not group nodes even if the number of monitored
nodes is greater than the threshold, specify the -no-group-nodes option when you start the
utility.
Modifying the Parameters that Define a Group
The xclus and xcxclus utilities use the following statistics to determine if Itanium processors
are members of the same group:
CPU utilization
FSB utilization
MID bus utilization
I/O bus utilization
The xclus and xcxclus utilities use the following statistics to determine if AMD Opteron
processors are members of the same group:
92
Using Xtools
CPU utilization
DRAM utilization
HyperTransport link utilization (processor-to-processor)
HyperTransport link utilization (to external devices)
If all the above utilization rates (for a given processor type) are the same within a tolerance range,
the nodes are placed in the same group. The default tolerance is 7%, so an Itanium node with a
50% CPU utilization rate and an Itanium node with a 53% CPU utilization rate are in the same
group if the other Itanium grouping parameters (FSB utilization, MID bus utilization, and I/O
bus utilization) also differ by 7 percentage points or less.
To modify the tolerance and other parameters that determine the proximity of the statistics for
grouping, select Option→Group Control... from the menu.
The Group Control dialog box contains the following options:
Tolerance
The allowed difference in the utilization percentages in the same group.
Range: 0 - 100%
Default: 7%
Grace
The extended tolerance for existing members of a group. The utilization rates for nodes
that are members of the same group can differ by the grace value and remain in the same
group.
Range: 0 - 100%
Default: 20%
Grace Delay, in
Seconds
The number of update cycles (specified in seconds) that nodes with statistics out of the
tolerance range but within the grace range can remain grouped together. When the grace
delay is exceeded, the utility regroups the nodes into groups that are within tolerance.
Range: 0 - 10 seconds
Default: 3 seconds
Subsume Delay, in
Seconds
The number of update cycles (specified in seconds) that groups can have the same statistics
(within the tolerance value) and not be combined. This causes xclus or xcxclus to
delay combining groups when node statistics are similar for only brief time periods.
Range: 0 - 10 seconds
Default: 3 seconds
Viewing Grouped Nodes
93
Using xperf and xcxperf
The following sections describe general procedures for using xperf and xcxperf. The xperf
and xcxperf utilities are similar, and the procedures for using them are the same, with the
following differences:
• By default, the xperf utility displays enhanced data. By default, the xcxperf utility displays
generic data.
The sections “Viewing Generic Data with xclus or xperf” (page 109) and “Viewing
Enhanced Data with xcxclus or xcxperf” (page 110) describe how to display alternate
data sets.
•
By default, you can start an HPCPI label from xperf. You cannot start a label from xcxperf
unless you are displaying enhanced data.
Starting xperf and xcxperf
To start xperf or xcxperf, you must follow these steps:
1.
2.
3.
Set up the Xtools environment. See “Step 1: Setting Up the Xtools Environment” (page 78).
Set the DISPLAY environment variable. See “Step 2: Setting the DISPLAY Environment
Variable” (page 78).
Start the xperf or xcxperf program by entering the xperf or xcxperf command.
You can also start xperf by clicking a node in an xclus display, and you can start xcxperf
by clicking a node in an xcxclus display.
94
Using Xtools
Viewing xperf and xcxperf Displays
By default, xperf displays graphs for the statistics listed in “Viewing Itanium xperf (Enhanced)
Statistics” (page 96) or “Viewing AMD Opteron xperf (Enhanced) Statistics” (page 98), and
xcxperf displays graphs for the statistics listed in “Viewing xcxperf (Generic) Statistics”
(page 101).
The following sections describe how to complete the following tasks:
• “Viewing Itanium xperf (Enhanced) Statistics” (page 96)
• “Viewing AMD Opteron xperf (Enhanced) Statistics” (page 98)
• “Viewing xcxperf (Generic) Statistics” (page 101)
• “Displaying Color Legends and Creating Tear-Away Legends” (page 104)
• “Hiding or Showing Graphs” (page 104)
• “Showing I/O Bandwidth or Utilization Rates” (page 104)
• “Showing Cycles Per Instruction or Instructions Per Cycle” (page 104)
• “Modifying Graph Colors and Line Widths” (page 104)
• “Using xperf (Enhanced) Menu Options” (page 104)
• “Using xcxperf (Generic) Menu Options” (page 105)
Viewing xperf and xcxperf Displays
95
Viewing Itanium xperf (Enhanced) Statistics
Figure 1-2 (page 22) shows an xperf display for an Itanium system. By default, xperf displays
graphs with processor-dependent, enhanced statistics. The xperf utility displays graphs with
the following enhanced statistics for Itanium processors.
NOTE: The processor event names listed in this section are used for Itanium Processor 9000
series and may differ slightly from the event names used for other Itanium processor types.
CPU
Displays the following CPU utilization percentages from the /proc/stat file:
• Idle: CPU utilization for idle cycles
• System: CPU utilization for system processes
• User: CPU utilization for user processes
Instructions
Displays the following instructions per cycle (IPC) statistics:
• Raw: IA64_INST_RETIRED + PREDICATE_SQUASHED_RETIRED
• Effective: IA64_INST_RETIRED - NOPS_RETIRED
FPC
Displays the statistic Retired, which is the number of floating point operations retired per cycle
(FP_OPS_RETIRED).
Cycles
Displays the following statistics, per cycle:
•
•
•
•
•
•
•
•
•
RSE Active: BE_RSE_BUBBLE
D1TLB: BE_L1D_FPU_BUBBLE.L1D_TLB
D2TLB: BE_L1D_FPU_BUBBLE.L1D_HPW
Data Access: (BE_L1D_FPU_BUBBLE.L1D - BE_L1D_FPU_BUBBLE.L1D_HPW BE_L1D_FPU_BUBBLE.L1D_TLB) + (BE_EXE_BUBBLE.GRALL BE_EXE_BUBBLE.GRGR + BE_EXE_BUBBLE.FRALL)
Score Board (stalls waiting for resources): BE_L1D_FPU_BUBBLE.FPU +
BE_EXE_BUBBLE.GRGR + BE_EXE_BUBBLE.ARCR_PR_CANCEL_BANK
BE Flush: BE_FLUSH_BUBBLE.ALL
Taken Branch: BACK_END_BUBBLE.FE * ((BE_LOST_BW_DUE_TO_FE.ALL (BE_LOST_BW_DUE_TO_FE.TLBMISS + BE_LOST_BW_DUE_TO_FE.IMISS)) /
BE_LOST_BW_DUE_TO_FE.ALL)
Inst Access: BACK_END_BUBBLE.FE * ((BE_LOST_BW_DUE_TO_FE.TLBMISS +
BE_LOST_BW_DUE_TO_FE.IMISS ) / BE_LOST_BW_DUE_TO_FE.ALL)
Execution: CPU_CYCLES - BACK_END_BUBBLE.ALL
Cache
Displays the following types of cache misses per second calculated from cache miss counts and
the time penalties for accessing different levels of cache memory:
• L1icache misses: Level 1 instruction cache misses
• L1dcache misses: Level 1 data cache misses
• L2cache misses: Level 2 cache misses (single-core only)
• L2icache misses: Level 2 instruction cache misses (dual-core only)
• L2dcache misses: Level 2 data cache misses (dual-core only)
96
Using Xtools
•
•
L3cache misses: Level 3 cache misses
TLB misses: Translation Lookaside Buffer misses
SysBus
Displays the following system bus utilization rates:
• Address: BUS_ALL.ANY
• Data: BUS_DATA_CYCLE
Branch
Displays the following branching statistics:
•
•
•
Branch Potential Speedup: The potential speed-up based on IPC if there were no
branch bubbles
Branch Bubbles per Bubbles: Branch bubbles as a percentage of all bubbles
Branch Bubbles per Cycle: Branch bubbles per CPU cycle
Sum I/O B/W
Displays the I/O bandwidth summed for all buses in Mb/s:
• Write: Write I/O bandwidth
• Read: Read I/O bandwidth
DMA B/W
Displays the DMA bandwidth, in Mb/s:
• Write: Write DMA bandwidth
• Read: Read DMA bandwidth
Viewing xperf and xcxperf Displays
97
Viewing AMD Opteron xperf (Enhanced) Statistics
Figure 7-9 shows an xperf display for an AMD Opteron system. By default, xperf displays
graphs with processor-dependent, enhanced statistics.
Figure 7-9 xperf Display for an AMD Opteron System
The xperf utility displays graphs with the following enhanced statistics for AMD Opteron
processors.
98
Using Xtools
NOTE: AMD does not provide code-usable names for AMD Opteron processor events. In
addition, the names listed in this section are used for single-core and dual-core AMD Opteron
processors and may differ slightly from the event names used for native quad-core AMD Opteron
processors.
CPU
Displays the following CPU utilization percentages from the /proc/stat file:
• Idle: CPU utilization for idle cycles
• System: CPU utilization for system processes
• User: CPU utilization for user processes
IPC
Displays retired instructions per cycle.
FPC
Displays floating point operations retired per cycle.
Cycles
Displays the following statistics per unhalted clock cycle:
•
•
•
•
•
•
•
•
•
Far Transfer Resync: DISPATCH_STALL_FAR_TRANSFER
Waiting for All quiet: DISPATCH_STALL_QUIET_WAIT
Segment Load: DISPATCH_STALL_SEG_LOAD
Serialization: DISPATCH_STALL_SERIALIZATION
Branch Abort to Retire: DISPATCH_STALL_FROM_BRANCH_ABORT
LS Full: DISPATCH_STALL_LS
FPU Full: DISPATCH_STALL_FPU
Reservation Station Full: DISPATCH_STALL_RESERVE_STATION
Reorder Buffer Full: RDISPATCH_STALL_EORDER_BUFFER
Execution
Displays the following events per unhalted instruction:
• Dispatch Stalls: DISPATCH_STALLS
• Decoder Empty: DECODER_EMPTY
• Dispatching: Unhalted Cycles - DISPATCH_STALLS - DECODER_EMPTY
Dcache
Displays the following types of data cache misses per retired instruction:
• Retired Instructions: Retired instructions (100%)
• L1D Misses: DATA_CACHE_REFILLS_FROM_L2_ALL
• L2D Misses: DATA_CACHE_REFILLS_FROM_LS_FROM.SYSTEM
• Misses: DATA_CACHE_MISSES
• Dcache Fetches: DATA_CACHE_FETCHES
Icache
Displays the following types of instruction cache misses per retired instruction:
• Retired Instructions: Retired instructions (100%)
• L1I Misses: ICACHE_REFILLS_FROM_SYSTEM
Viewing xperf and xcxperf Displays
99
•
•
•
L2I Misses: ICACHE_REFILLS_FROM_LS_FROM.SYSTEM
Misses: ICACHE_MISSES
Icache Fetches: ICACHE_FETCHES
Branch
Displays the following branch metrics:
• Branch Rate: RETIRED_BRANCHES / RETIRED_INSTRS
• Mispredicts: RETIRED_BRANCHES_MISPREDICTED / RETIRED_BRANCHES
• Branches Taken: RETIRED_TAKEN_BRANCHES / RETIRED_BRANCHES
DRAM
Displays the following events per unhalted cycles:
• DRAM Conflicts Per 10k Cycles: DRAM_ACCESSES.PAGE_CONFLICT
• DRAM Misses Per 10k Cycles: DRAM_ACCESSES.PAGE_MISS
• DRAM Hits Per 10k Cycles: DRAM_ACCESSES.PAGE_HIT
Memory (Third Generation AMD Opteron Only)
Displays the following events per unhalted cycles:
• Read Requests: MEMCTL_REQUESTS.READ_REQS_TO_DCT
• Write Requests: MEMCTL_REQUESTS.WRITE_REQS_TO_DCT
• Prefetch Requests: MEMCTL_REQUESTS.PREFETCH_REQS_TO_DCT
CpuloRequests (AMD Opteron Only)
Displays the following local and remote requests events per retired instruction:
• Local Memory Requests: REQUEST_FLOW.LOCAL_TO_LOCAL.ANY_TO_MEM
• Remote Memory Requests: REQUEST_FLOW.LOCAL_TO_REMOTE.ANY_TO_MEM
• Local IO Requests: REQUEST_FLOW.LOCAL_TO_LOCAL.ANY_TO_IO
• Remote IO Requests: REQUEST_FLOW.LOCAL_TO_REMOTE.ANY_TO_IO
Other: all other requests
HTn (HyperTransport Links)
Displays the HyperTransport bandwidth utilization for the following types of traffic, where n
is the link number:
• Release: HT_LINK_n_XMIT_BW.RELEASE / HT_LINK_n_XMIT_BW.ALL
• Data: HT_LINK_n_XMIT_BW.DATA / HT_LINK_n_XMIT_BW.ALL
• Command: HT_LINK_n_XMIT_BW.COMMAND / HT_LINK_n_XMIT_BW.ALL
• Nop: The remainder of the bandwidth
100
Using Xtools
Viewing xcxperf (Generic) Statistics
Figure 7-10 shows an xcxperf display. By default, xcxperf displays graphs with
processor-independent statistics for all processors.
Figure 7-10 xcxperf Display
The xcxperf utility displays graphs with the following generic statistics. If a component is not
installed on a system, xcxperf does not display the corresponding graph.
CPU
Displays the following CPU utilization percentages from the /proc/stat file:
• Idle: CPU utilization for idle cycles
• System: CPU utilization for system processes
• User: CPU utilization for user processes
Viewing xperf and xcxperf Displays
101
Disk
Displays the throughput rates in Mb/s for the following disk activities from /proc/diskstats:
• Write
• Read
NFS
Displays statistics for the following NFS activities in calls per second from /proc/net/rpc/nfs:
• Write
• Read
Lustre
Displays the throughput rates in Mb/s for the following Lustre activities from
/proc/fs/lustre/llite:
• Write
• Read
Infiniband
Displays the throughput rates in Mb/s for the following Infiniband activities from
/proc/voltaire:
• Write
• Read
Ethernet
Displays the throughput rates in Mb/s for the following Ethernet activities from /proc/net/dev:
• Write
• Read
Interrupts
Displays the number of interrupts per second (Usage) from /proc/stat.
ContextSwitch
Displays the number of context switches per second (Usage) from /proc/stat.
Sockets
Displays the following statistics from /proc/net/sockstat:
•
•
•
•
•
•
•
•
•
•
102
Frag Mem: Total memory used for fragmented sockets
Frag Inuse: Number of fragmented sockets in use
Raw Inuse: Number of raw sockets in use
UDP Inuse: Number of UDP sockets in use
TCP Mem: Total memory used for TCP sockets
TCP Alloc: Number of all TCP sockets allocated
TCP TW: Number of TCP sockets in TIME-WAIT state
TCP Orphan: Number of TCP sockets orphaned
TCP Inuse: Number of TCP sockets in use
Sockets Used: Total number of sockets in use
Using Xtools
Elan
Displays the throughput rates in Mb/s for the following Quadrics QsNetII interconnect activities
from Elan memory registers:
• Write
• Read
Memory
Displays the following utilization rates (percentages) from /proc/meminfo:
• Free: Free memory
• Buffers: Memory in the buffer caches
• Cached: Memory in the page cache minus the swap cache
• SwapCached: Memory in the swap cache—memory that once was swapped out and is
swapped back in, but also still in the swapfile
• Application: Memory that is not in a cache or that is not free
Swap
Displays the following utilization rates (percentages) from /proc/meminfo:
•
•
Free: Amount of swap space free
Used: Amount of swap space in use
VMalloc
Displays the following utilization rates (percentages) for memory allocated using vmalloc(),
as gathered from /proc/meminfo:
•
•
Free: Amount of free dynamic memory
Used: Amount of dynamic memory in use
Viewing xperf and xcxperf Displays
103
Displaying Color Legends and Creating Tear-Away Legends
To display the color legend for a graph, select the menu item with the graph name, such as CPU
in Figure 7-11.
If you select the tear-away icon (the perforated line at the top of the drop-down menu, which is
circled in Figure 7-11), The xperf and xcxperf utilities create a tear-away (standalone) color
legend for the graph. You can move the legend next to the appropriate graph for visual correlation.
Figure 7-11 Displaying the CPU Color Legend
Hiding or Showing Graphs
To hide a graph, select Options→Hide/Show from the menu at the top of the display, then select
the name of the graph you want to hide. For example, you can hide the FPC (floating point)
graph by selecting Options→Hide/Show→FPC and toggling the setting for FPC.
Showing I/O Bandwidth or Utilization Rates
By default, xperf shows utilization rates for I/O devices. To show bandwidth rates in Mb/s,
select Options→I/O→Bandwidth from the menu at the top of the display.
Showing Cycles Per Instruction or Instructions Per Cycle
By default, xperf shows instructions per cycle (IPC). To show cycles per instruction (CPI), select
Options→Instructions→CPI from the menu at the top of the display.
Modifying Graph Colors and Line Widths
To change the colors or line widths for a graph, modify the X11 resource
*xperf*graph_name.colors or *xperf*graph_name.lineWidth for the graph before
starting xperf or xcxperf. To determine the graph name, enter the command xperf -graphs
or xcxperf -graphs.
For more information, see the EXTERNAL INFLUENCES section of xperf(1) or xcxperf(1).
Using xperf (Enhanced) Menu Options
Table 7-3 describes the default menu options at the top of the xperf display window. The xperf
or xcxperf utilities support these menu options when displaying enhanced node information
(the default mode for xperf).
Table 7-3 xperf (Enhanced) Menu Options
104
Menu
Option
Description
File
Exit..
Stops the xperf utility.
Options
Hide/Show
Opens a submenu with graph names that enables you to hide or show graphs.
Using Xtools
Table 7-3 xperf (Enhanced) Menu Options (continued)
Menu
Option
Description
System
Information
Opens a dialog box that displays system information, as shown in Figure 7-12
(page 108).
Instructions
Opens a dialog box that enables you to display cycles per instruction instead of
instructions per cycle.
I/O
Enables you to display the I/O data in terms of bandwidth (Mb/s) or utilization.
(Supported only on Itanium processors.)
HT Bandwidth Enables you to display the HyperTransport data in terms of bandwidth (Mb/s) or
utilization. (Supported only on AMD Opteron processors.)
graph_name
Displays a color legend for the graph, with the option to create a tear-away legend,
as described in “Displaying Color Legends and Creating Tear-Away Legends”
(page 104).
Help
Displays version information.
Using xcxperf (Generic) Menu Options
Table 7-4 describes the default menu options at the top of the xcxperf display window. The
xcxperf or xperf utilities support these menu options when displaying generic node
information (the default mode for xcxperf).
Table 7-4 xcxperf (Generic) Menu Options
Menu
Option
Description
File
Exit..
Stops the xcxperf utility.
Options
Hide/Show
Opens a submenu with graph names that enables you to hide or show
graphs.
System Information
Opens a dialog box that displays system information, as shown in
Figure 7-12 (page 108).
graph_name
Displays a color legend for the graph, with the option to create a tear-away
legend, as described in “Displaying Color Legends and Creating Tear-Away
Legends” (page 104).
Help
Displays version information.
Viewing xperf and xcxperf Displays
105
Starting an HPCPI Label from xperf
You can start an HPCPI label and collect data for that label from the xperf utility. An HPCPI
label enables you to analyze a time interval of an application or system. To start an HPCPI label
from xperf, select HPCPI→Start Label from the menu at the top of the display.
When you start an HPCPI label from the xperf utility, the label applies to all processes on the
system. If no HPCPI daemon was running when xperf started, xperf starts an HPCPI daemon
that samples CPU cycles only and uses the database directory /var/opt/xtools/log/db.
If an HPCPI daemon was running when xperf started, HPCPI continues to sample the same
events. When you view the reports for the label, you see all events HPCPI sampled.
The xperf utility displays the label name it creates at the bottom of its display window. The
label name uses the following format:
userName___timestamp[a-z]
Where:
userName
timestamp
[a-z]
Your user name, padded to 11 characters using underscores (_).
The GMT timestamp (mmddhhmm).
A lowercase character starting with a and incremented to disambiguate labels
that are otherwise the same.
Before viewing the HPCPI data, stop collecting data for the label by selecting HPCPI→Stop
Label.
To view the HPCPI data, select HPCPI→View Report Last Label, which opens a new window
and displays hpcpiprof output with per-image CPU utilization statistics. Alternatively, you
can select HPCPI→View Report, which opens a window that enables you to select a report based
on the label name.
You can copy the HPCPI database to another directory by selecting HPCPI→Copy Database,
which opens a window that enables you to select the target location. This is useful for preserving
the HPCPI database from the node. If node is a compute node in an HP XC cluster, you might
be unable to access the node on which xperf ran after your job completes, and you might also
be unable to access the HPCPI the database directory after your job completes.
You can also view the data for your label using other HPCPI utilities and options by manually
running the utilities and specifying the appropriate database directory. If there were other labels
active in the same directory, or there are non-labeled performance data files in the directory,
specify your label name to display only the data for your label.
106
Using Xtools
Recording, Replaying, and Plotting xperf and xcxperf Data
You can save the data from the xperf or xcxperf utility in a file. The utilities update data for
each monitored node every second. You can use this data file either to replay the data or to plot
graphs of node performance statistics.
The procedures and options (-output, -plot) for recording, replaying, and plotting data are
the same as the procedures used with xclus and xcxclus (“Recording, Replaying, and Plotting
xclus and xcxclus Data” (page 89)). The only difference is that the names of the files xperf
or xcxperf create have xperf in the suffix instead of xclus. The file names created are as
follows:
data_file_prefix.xperf
Data file for xperf or xcxperf.
plot_file_prefix.xperf.plotdata Data file for gnuplot.
plot_file_prefix.xperf.gnuplot Script file for gnuplot.
Recording, Replaying, and Plotting xperf and xcxperf Data
107
Displaying System Information with xperf or xcxperf
If you select Options→System Information from the menu at the top of the display, xperf or
xcxperf opens a display window with system information. Figure 7-12 shows an xcxperf
system information window.
Figure 7-12 System Information Display
108
Using Xtools
Viewing Generic Data with xclus or xperf
By default, the xclus and xperf utilities display enhanced data. You can force xclus and
xperf to display generic data by specifying the -generic option. For example, the following
command starts xclus so it displays generic data:
% xclus -generic
Starting xclus with the -generic option causes it to display the same data that xcxclus
displays. The difference is that the you do not need a job allocation, and you do not need superuser
capabilities to specify the -unrestricted-nodes option, which enables you to monitor nodes
outside your job allocation.
Starting xperf with the -generic option causes it to display the same data that xcxperf
displays. The difference is that if you specify the -node option to run xperf on a remote node,
the remote node does not need to be within your job allocation.
You can specify both the -generic and the -enhanced option when you run xperf. The
xperf utility displays both generic and enhanced data. You can also specify both options when
you run xclus. The xclus utility displays only generic data, but if you click on a node icon,
xperf starts and displays both generic and enhanced data.
Viewing Generic Data with xclus or xperf
109
Viewing Enhanced Data with xcxclus or xcxperf
By default, the xcxclus and xcxperf utilities display generic data. You can force xcxclus
and xcxperf to display enhanced data by specifying the -enhanced and either the -apmond
or -clusmond option. For example, the following command starts xcxclus so it displays
enhanced data:
% xcxclus -enhanced -apmond
Starting xcxclus with the -enhanced option causes it to display the same data that xclus
displays. The difference is that you need superuser privileges to specify the
-unrestricted-nodes option to monitor nodes outside your job allocation.
You must specify the -apmond or -clusmond option to view enhanced data because the default
daemons used by xcxclus and xcxperf cannot collect enhanced data. For more information
about the Xtools daemons, see “Xtools Daemons” (page 111).
Starting xcxperf with the -enhanced option causes it to display the same data that xperf
displays. The difference is that if you need superuser privileges to specify the -node and
-unrestricted-nodes options to run xcxperf on a node outside your job allocation.
110
Using Xtools
Xtools Daemons
Xtools use the following daemons:
•
apmond and clusmond
The apmond and clusmond daemons are included with the Xtools software and collect
enhanced statistics. The Xtools start these daemons when you run xclus or xperf using
default parameters if they are not already running.
The apmond daemon collects processor-specific data for an individual node and runs on
each node being monitored. Only one instance of apmond is needed per node.
The clusmond daemon aggregates data it collects from the apmond daemons. There is
generally one clusmond process on each node where a user is running xclus or xperf
(in HP XC clusters, there is one clusmond process on each login node where xclus or
xperf is running).
The apmond and clusmond daemons are started by xinetd and are not in the user process
tree.
•
mond and supermon
The mond and supermon daemons are included with the HP XC software and collect generic
statistics. The Xtools start these daemons when your run xcxclus or xcxperf using default
parameters.
The mond daemon collects generic data for an individual node and runs on each node being
monitored. Only one instance of mond is needed on a node.
The supermon daemon aggregates data it collects from the mond daemons. The Xtools start
supermon as a child process under xcxclus or xcxperf.
By default, supermon runs on the login node. If you specify the -login-node option when
you start xcxclus or xcxperf, supermon runs on the LSF-HPC execution host.
Xtools Daemons
111
112
A Product Specifications
This appendix contains product specifications.
HPCPI Database Directories and Files
The database root directory contains the following items:
• A subdirectory for each epoch. The subdirectory names are based on Greenwich Mean Time
(GMT) timestamp for the start of the epoch and have the following format:
yyyymmddhhmm
Where:
yyyy is the year
mm is the month
dd is the day
hh is the hour (using a 24-hour clock)
mm are the minutes
For example, an epoch started on January 2, 2008 at 17:30 GMT has the subdirectory name
200801021730.
You can specify the -epoch-with-seconds option when you start the hpcpid daemon
to append the seconds from the timestamp to the subdirectory name.
•
Log files for hpcpid. Each log file name has the format hpcpid-host_name.log, where
host_name is the name of the system where hpcpid is running. For example,
hpcpid-system1.log.
The epoch subdirectories contain subdirectories for the systems on which hpcpid runs. Each
subdirectory name is the host name of the system.
The system subdirectories contain profile files (performance data files). The profile file names
have the following format:
imageName.checksum[_labelName]_imageNameLength
Where:
The image file name.
imageName
A literal period delimiter.
.
A checksum value (16 hexadecimal digits) for the image file. HPCPI
checksum
uses this value to verify that the image file it uses when analyzing
performance data matches the image file executed to generate the
performance data. The checksum enables HPCPI to distinguish between
image files with the same name in multiple directories. It also enables
HPCPI to distinguish between multiple versions or compilations of an
image file.
An literal underscore delimiter. This is present only if the profile file
_
contains labeled data (the hpcpictl label command was used to
associate data with a label name).
The label name. This is present only if the profile file contains labeled
labelName
information (the hpcpictl label command was used to associate
data with a label name).
A literal underscore delimiter.
_
The length of the image name.
imageNameLength
Examples
Figure A-1 shows a sample HPCPI database.
HPCPI Database Directories and Files
113
Figure A-1 HPCPI Database
$HPCPIDB
200802141532
node 1
...
node 3
node 2
...
App12.ebadcb63fb63e830_myLabel_5
...
200802141744
200802141712
...
sum.0479a583cd891014_3
The HPCPIDB environment variable is set to /tmp/hpcpidb. One of the epochs started on
February 14, 2008 at 17:12 GMT on the system node2 contains the following profile file with
data for sum:
/tmp/hpcpidb/200802141712/node2/sum.0479a583cd891014_3
The following profile file contains data for the application App12 and is associated with the label
myLabel:
/tmp/hpcpidb/200802141712/node2/App12.ebadcb63fb63e830_myLabel_5
hpcpicat Output
The hpcpicat utility displays the contents of a performance data file with minimal formatting.
This utility is a debugging tool for advanced users and is not intended for general performance
profiling. A sample output follows:
name
image
path
epoch
platform
text_start
text_size
myApp 1
2095041907123563e830 2
/var/users/user1/bin/myApp
200801010338 4
node6 5
0x4000000000000000 6
0x000000000e40 7
8
9
event
event
event
14
1
2
114
10
27374040000 (
25317240000 (
49626660000 (
15
0x4000000000000500
0x4000000000000501
0x4000000000000510
:
:
16
330120000
293220000
277740000
The image name.
The checksum for the
image. This enables
HPCPI to distinguish
between image files with
the same name in
multiple directories. It
also enables HPCPI to
distinguish between
multiple versions of an
image file.
Product Specifications
3
17
2563080000
240000
0
7
8
9
11
12
13
456234) CPU_CYCLES:60000:1/1
421954) NOPS_RETIRED:60000:1/1
827111) IA64_INST_RETIRED:60000:1/1
18
480000
420000
780000
The size of the executable
portion of the image, in
bytes.
The next three lines
contain a table with event
information. Each line
contains information for
one event.
The event count for the
event, as calculated by
HPCPI. This is the
13
The active fraction for the
event, in the form 1/n,
which is the fraction of
time the event was active
in the PMU. If this value is
less than 1/1, the number
of events sampled was
greater than the number of
performance counters and
the event was not in all
duty groups.
3
4
5
6
The fully-qualified path
name for the image file.
The epoch. See “HPCPI
Database Directories and
Files” (page 113) for the
epoch name format.
The host name of the
system on which hpcpid
ran.
The starting virtual
memory address for the
loaded image.
10
11
12
number of samples
recorded for the event
multiplied by the
sampling interval.
The number of samples
recorded for the event.
The event name.
The sampling interval. If
n is the sampling interval,
HPCPI records the
instruction pointer every
nth occurrence of the
event.
14
15
16
17
18
A table with
per-instruction statistics.
Each line contains event
counts for an instruction.
The virtual memory
address of the instruction
pointer.
The calculated event count
for the first event
(CPU_CYCLES in this
example).
The calculated event count
for the second event
(NOPS_RETIRED in this
example).
The calculated event count
for the third event
(IA64_INST_RETIRED in
this example).
HPCPI Product Limitations
The following sections describe HPCPI product limitations.
Skid
When a PMU counter overflows, the PMU calls the interrupt handler to record the value of the
instruction pointer. However, there is a lag—or skid—that occurs because of the delay between
the time the instruction enters the execution pipeline and when the PMU detects the event counter
overflow. There is also a delay between the time the PMU detects the overflow and when the
interrupt handler records the instruction pointer. In addition, the interrupt handler records the
instruction pointer for the instruction to return to after handling the interrupt. Because of this
lag, the recorded instruction pointer is typically several or many instructions after the instruction
that caused the event. These lags are common to all instruction-pointer sampling profilers and
HPCPI does not attempt to model the system to correct for this. As a result, HP recommends
that you examine the assembly code surrounding regions where high event counts occur and
consider if the surrounding code may be triggering the events.
Attribution Issues
Associating PMU events with specific lines in source code is problematic with modern compilers
and multi-issue architectures, especially when code optimization is performed.
Inline Routines
When a program contains short routines or routines that are called only once, compilers often
inline, or insert a copy of the function's instructions directly within the calling procedure.
When hpcpiprof displays data, it attempts to detect inlined procedures and attribute events
to the source of the inlined procedure instead of the calling procedure. If hpcpiprof cannot not
detect or determine the source of an inline procedure, it attributes the events to the calling
procedure.
The hpcpilist utility always shows the instructions for inlined procedures and the events
counts for the instructions within the calling procedure.
Note that there can be differences in hpcpiprof output and hpcpilist output for inlined
procedures, because hpcpiprof might attribute events to the inlined procedures, but hpcpilist
shows these events only in the calling procedure.
HPCPI Product Limitations
115
Multi-Issue Architectures
In multi-issue architectures (those that can execute more than one instruction per cycle), the
interrupt handler associates only one instruction in a bundle with an event. The other instructions
in the bundle have no associated events. This can skew the attribution of events to instructions.
Calls to exec()
If a process uses the exec() system call or its variants, HPCPI can attribute events to the wrong
image and it is possible to get samples for unexecuted instructions. Samples gathered prior to
an exec() call might be attributed to the image loaded by the exec() call. In most applications,
a process typically calls exec() soon after being created, so HPCPI collects only a few samples
prior to the exec() call and only a few samples might be attributed to the wrong image.
Unknown Locations
HPCPI is sometimes unable to map an instruction pointer value to a known code region. This is
often caused by a call to a routine outside the current object module using a trampoline, or code
generated by the compiler or linker automatically that does not exist within any routines in the
source object module. Runtime compilers, such as Java™ Just-in-Time (JIT) compilers, also
generate executable code that is unknown to HPCPI. In these cases, the HPCPI analysis tools
map these instructions to the routine unknown_rou.
Mandated Duty Groups
Some events cannot be monitored by the PMU at the same time, so HPCPI creates additional
duty groups as required. This restriction is based on requirements from the processor
manufacturers. For example, Itanium processors require separate groups for LOADS_RETIRED
and STORES_RETIRED.
Active Fraction Changes
The number of event groups can change while hpcpid is running if the xperf utility starts. If
this occurs, an event can have multiple sampling interval and/or active fraction values during
the reporting period. HPCPI calculates one effective sampling interval and one effective active
fraction value for each event for the entire time period. HPCPI maintains event counts and other
counts that are unaffected by changing values of the sampling interval and active fraction. It
uses these counts to derive effective values for the reported sampling interval and active fraction.
116
Product Specifications
B HPCPI Quick Reference
This appendix contains quick reference information for basic HPCPI tasks.
Starting HPCPI
Table B-1 Starting HPCPI
To Perform this Task
Use this Command
Reference
Set up the HPCPI
environment
module load hpcpi
“Setting Up the HPCPI
Environment” (page 35)
Select a directory for the
HPCPI database
n/a
“Selecting a Location for the
HPCPI Database Directory”
(page 36)
Set the default HPCPI
database directory
setenv HPCPIDB directory
“Setting the Default Database
Directory Environment
Variable (HPCPID)” (page 36)
(or the equivalent command for your user shell)
Start the HPCPI daemon hpcpid
“Starting the hpcpid Daemon”
(page 36)
Select events to be
monitored
hpcpid -events event[,event]...
“Selecting Events to Monitor”
(page 37)
Display event names
hpcpid -show-events
“Selecting Events to Monitor”
(page 37)
Display event set names
hpcpid -show-event-sets
“Selecting Events to Monitor”
(page 37)
Table 4-1 (page 38)
Stopping and Controlling HPCPI
Table B-2 Stopping and Controlling HPCPI
To Perform this Task
Use this Command
Reference
Flush HPCPI data to disk hpcpictl flush
“Flushing Data to Disk:
hpcpictl flush” (page 41)
Stop the HPCPI daemon hpcpictl stop
“Stopping the Daemon:
hpcpictl quit” (page 41)
Start a new epoch
hpcpictl epoch
“Starting a New Data Epoch:
hpcpictl epoch” (page 41)
Display status
information about
hpcpid, display the
current database and
epoch names
hpcpictl show
“Displaying HPCPI Status
Information: hpcpictl show”
(page 41)
Starting HPCPI
117
Viewing HPCPI Data
Table B-3 Viewing HPCPI Data
To Perform this Task
Use this Command
Reference
Display per-image data
hpcpiprof
“Viewing Per-Image Data:
hpcpiprof” (page 44)
Display per-procedure
data
hpcpiprof image_name
“Viewing Per-Procedure Data:
hpcpiprof image_name”
(page 46)
Display per-instruction
data
hpcpilist procedure_name image_name
“Viewing Per-Instruction Data:
hpcpilist
procedure_name
image_name” (page 47)
Display the instructions
with the highest event
counts
hpcpitopcounts
“Listing the Instructions with
the Highest Event Counts:
hpcpitopcounts” (page 49)
Display the instructions
with the highest event
counts in an image
hpcpitopcounts image_name
“Listing Instructions in an
Image: hpcpitopcounts
image_name” (page 50)
Display raw profile data hpcpicat
hpcpicat(1)
Controlling Input and Output for HPCPI Utilities
Table B-4 Controlling Input and Output for HPCPI Utilities
To Perform this Task
Use this Option with hpcpiprof, hpcpilist, or
hpcpitopcounts
Specify an alternate
database
-db db_directory
“Specifying an Alternate
Database” (page 51)
Select an epoch
-epoch name
“Specifying an Alternate
Epoch” (page 51)
-epoch latest-k
Select output by system
-hosts hostname[,hostname]...
-hosts all-hostname[,hostname]...
“Selecting Data by System”
(page 51)
Select output by label
-label label_name [-label label_name...]
“Selecting Data by Label”
(page 52)
Select output events
-event event[,event]...
“Specifying Events to Display”
(page 52)
-event all-event[,event]...
Specify an alternate sort
key
118
Reference
-st event_name
“Specifying an Alternate Sort
Key” (page 53)
Limit hpcpiprof output -keep percentage
by cumulative percentage (Only valid with the hpcpiprof command.)
“Limiting the hpcpiprof
Output” (page 54)
Display raw values
-raw-numbers
“Displaying Raw Values”
(page 53)
Create HTML output
-output-format html
“Additional Options” (page 54)
Suppress header output
-no-header
“Additional Options” (page 54)
HPCPI Quick Reference
C Xtools Quick Reference
This appendix contains quick reference information for Xtools.
xclus and xcxclus Tasks
This section contains quick reference information for basic xclus and xcxclus tasks.
Starting xclus or xcxclus
Table C-1 Starting xclus or xcxclus
To Perform this Task
Use this Command or Procedure
Reference
Set up the Xtools
environment
module load xtools
“Step 1: Setting Up the Xtools
Environment” (page 78)
Set the DISPLAY
environment variable
setenv DISPLAY display_name
“Step 2: Setting the DISPLAY
Environment Variable”
(page 78)
(or the equivalent command for your user shell)
Start xclus or xcxclus xclus|xcxclus
and monitor the nodes in
your job allocation
“Step 3: Starting xclus or
xcxclus” (page 78)
Start xclus or xcxclus Use one of the following commands to start the utility “Specifying Nodes for xclus
and monitor a subset of and specify the nodes you want to monitor:
or xcxclus” (page 79)
the nodes in your job
• xclus|xcxclus -nodes
allocation
node_specification...
• xclus|xcxclus -cluster cluster_filename
Where cluster_filename contains the names of
the nodes or systems you want to monitor
Start xclus or xcxclus
and monitor nodes
outside your job
allocation
Specify the nodes or systems you want to monitor as
described previously and specify the option
-unrestricted-nodes. You must have superuser
capabilities to use this option with xcxclus.
“Specifying Nodes for xclus
or xcxclus” (page 79)
Start xclus on a
standalone system
You must specify the nodes or systems you want to
monitor. Use one of the following commands:
• xclus -nodes node_specification...
• xclus -cluster cluster_filename
“Specifying Nodes for xclus
or xcxclus” (page 79)
Where cluster_filename contains the names of
the nodes or systems you want to monitor
• xclus
Where the file named cluster in the current
working directory contains the names of the nodes
or systems you want to monitor
Modifying xclus and xcxclus Displays
Table C-2 Modifying xclus or xcxclus Displays
To Perform this Task
Use this Procedure
Reference
Show names and
descriptions for an icon
area
Move your mouse over the icon area.
“Showing Statistic Names and
Descriptions” (page 86)
Show bandwidth rates
Select the menu option
instead of utilization rates Options→bus_name→Bandwidth.
“Showing Bandwidth or
Utilization Rates” (page 86)
xclus and xcxclus Tasks
119
Table C-2 Modifying xclus or xcxclus Displays (continued)
To Perform this Task
Use this Procedure
Reference
Show HyperTransport
Select the menu option Options→HT All.vs.Data→Data. “Showing HyperTransport
statistics for data packets
Data Statistics or Data and
only (useful for
Control Statistics” (page 86)
confirming data rates)
instead of statistics for
data and control packets
Change the refresh rate
Select the menu option Options→Refresh.
“Changing the Refresh Rate”
(page 86)
Hide the statistic values
Select View→Values from the menu at the top of the
display and deselect Values.
“Hiding Statistic Values”
(page 86)
Suspend the display
Select the menu option Hold.
“Suspending the Display”
(page 86)
Modify the size of the
display
Use one of the following methods:
• Select the menu option View→Zoom
• Set the X11 resource *xclus.zoom before starting
xclus or xcxclus
“Modifying the Display Size
and Layout” (page 86) and the
EXTERNAL INFLUENCES
section of xclus(1) andxcxclus(1)
Modify the number of
icons per row
Use one of the following methods:
• Specify the -row-width option when you start
xclus or xcxclus
• Set the X11 resource *xclus.nodesPerRow before
starting xclus or xcxclus
“Modifying the Display Size
and Layout” (page 86) and the
EXTERNAL INFLUENCES
section of xclus(1) andxcxclus(1)
Display generic data with Specify the -generic option when you start xclus.
xclus
Display enhanced data
with xcxclus
“Viewing Generic Data with
xclus or xperf” (page 109)
Specify the -enhanced option and either the -apmond “Viewing Enhanced Data with
xcxclus or xcxperf”
or -clusmond option when you start xcxclus.
(page 110)
Recording, Replaying, and Plotting xclus or xcxclus Data
Table C-3 Recording, Replaying, and Plotting xclus or xcxclus Data
To Perform this Task
Use this Command
Save output to a file
Specify the -output data_file_prefix option when “Recording, Replaying, and
Plotting xclus and xcxclus
you start xclus or xcxclus.
Data” (page 89)
Replay data
Specify the data_file_prefix of the output file when “Recording, Replaying, and
Plotting xclus and xcxclus
you start the xclus or xcxclus.
Data” (page 89)
Plot the data
Use the following syntax to run xclus or xcxclus:
xclus|xcxclus -plot plot_file_prefix
data_file_prefix
120
Xtools Quick Reference
Reference
“Recording, Replaying, and
Plotting xclus and xcxclus
Data” (page 89)
xperf and xcxperf Tasks
This section contains quick reference information for basic xperf and xcxperf tasks.
Starting xperf or xcxperf
Table C-4 Starting xperf or xcxperf
To Perform this Task
Use this Command or Procedure
Reference
Set up the Xtools
environment
module load xtools
“Step 1: Setting Up the Xtools
Environment” (page 78)
Set the DISPLAY
environment variable
setenv DISPLAY display_name
“Step 2: Setting the DISPLAY
Environment Variable”
(page 78)
(or the equivalent command for your user shell)
Start xperf or xcxperf Enter the xperf or xcxperf command.
from the command line
“Starting xperf and xcxperf”
(page 94)
Start xperf or xcxperf Click on a node icon from the xclus or xcxclus
from xclus or xcxclus display.
“Starting xperf or xcxperf
from xclus or xcxclus”
(page 92)
Modifying xperf and xcxperf Displays
Table C-5 Modifying xclus or xcxclus Displays
To Perform this Task
Use this Procedure
Reference
Display the color legend Select the graph name from the menu at the top of the
for a graph
display.
“Displaying Color Legends and
Creating Tear-Away Legends”
(page 104)
Create a tear-away
legend for a graph
Select the graph name from the menu at the top of the
display, then select the tear-away icon (the perforated
line) from the submenu.
“Displaying Color Legends and
Creating Tear-Away Legends”
(page 104)
Hide or show graphs
Select the menu option Options→Hide/Show, then
toggle the setting for the graph name.
“Hiding or Showing Graphs”
(page 104)
Show I/O bandwidth
Select the menu option Options→I/O→Bandwidth
rates instead of utilization .
rates
“Showing I/O Bandwidth or
Utilization Rates” (page 104)
Show cycles per
instruction instead of
instructions per cycle
Select the menu option Options→Instructions→CPI.
“Showing Cycles Per
Instruction or Instructions Per
Cycle” (page 104)
Modify graph colors or
line widths
Modify the X11 resource *xperf*graph_name.colors “Modifying Graph Colors and
Line Widths” (page 104)
or *xperf*graph_name.lineWidth for the graph
before starting xperf or xcxperf. To determine the
graph name, enter the command xperf -graphs or
xcxperf -graphs.
Display generic data with Specify the -generic option when you start xperf.
xperf
Display enhanced data
with xcxperf
“Viewing Generic Data with
xclus or xperf” (page 109)
Specify the -enhanced option and either the -apmond “Viewing Enhanced Data with
xcxclus or xcxperf”
or -clusmond option when you start xcxperf.
(page 110)
xperf and xcxperf Tasks
121
Additional xperf and xcxperf Tasks
Table C-6 Additional xperf and xcxperf Tasks
To Perform this Task
122
Use this Procedure
Reference
Start an HPCPI label from Select the menu option HPCPI→Start Label.
xperf
“Starting an HPCPI Label from
xperf” (page 106)
Stop the HPCPI label
Select the menu option HPCPI→Stop Label.
“Starting an HPCPI Label from
xperf” (page 106)
View the HPCPI label
data
Select the menu option HPCPI→View Report Last Label “Starting an HPCPI Label from
xperf” (page 106)
or HPCPI→View Report.
Copy the HPCPI label
database
Select the menu option HPCPI→Copy Database.
Record, replay, and plot
output
The command options for xperf and xcxperf are the
same as the options for xclus and xcxclus. See
Table C-3 (page 120).
Display system
information
Select the menu option Options→System Information. “Displaying System
Information with xperf or
xcxperf” (page 108)
Xtools Quick Reference
“Starting an HPCPI Label from
xperf” (page 106)
Glossary
active fraction
The fraction of time an event was active in the PMU.
See also duty group.
duty group
A group of HPCPI events, used to multiplex the events monitored. If hpcpid is monitoring
more events than the number of event counters available for the processor PMU, hpcpid places
the events in duty groups and multiplexes (cycles through) the duty groups so that only the
events in one duty group are monitored at any time.
enhanced
statistics
Statistics that are processor-dependent. By default, the xclus and xperf utilities display
enhanced statistics.
epoch
A time-based division of HPCPI data. By default, the HPCPI daemon starts a new epoch each
time it runs. The HPCPI database contains a different subdirectory for each epoch.
generic statistics
Statistics that are processor-independent. By default, the xcxclus and xcxperf utilities display
generic statistics.
golden client
The node from which a standard file system image is created. The golden image is distributed
by the image server. In a standard HP XC installation, the head node acts as the image server
and golden client.
golden image
A collection of files, created from the golden client file system that are distributed to one or
more client systems. Specific files on the golden client may be excluded from the golden image
if they are not appropriate for replication.
golden master
The collection of directories and files that represents all of the software and configuration data
of an HP XC system. The software for any and all nodes of an HP XC system can be produced
solely by the use of this collection of directories and files.
head node
The single node that is the basis for software installation, system configuration, and
administrative functions in an HP XC system. There may be another node that can provide a
failover function for the head node, but HP XC system has only one head node at any one time.
image server
A node specifically designated to hold images that will be distributed to one or more client
systems. In a standard HP XC installation, the head node acts as the image server and golden
client.
job allocation
Nodes allocated to the user by the SLURM, LSF-HPC, or RMS subsystem. Also referred to as
node allocation.
label
An identifier for HPCPI data, created using the hpcpictl label command.
LSF execution
host
The node on which LSF runs. A user's job is submitted to the LSF execution host. Jobs are
launched from the LSF execution host and are executed on one or more compute nodes.
LSF-HPC with
SLURM
Load Sharing Facility for High Performance Computing integrated with SLURM. The batch
system resource manager on an HP XC system that is integrated with SLURM. LSF-HPC with
SLURM places a job in a queue and allows it to run when the necessary resources become
available. LSF-HPC with SLURM manages just one resource: the total number of processors
designated for batch processing.
LSF-HPC with SLURM can also run interactive batch jobs and interactive jobs. An LSF interactive
batch job allows you to interact with the application while still taking advantage of LSF-HPC
with SLURM scheduling policies and features. An LSF-HPC with SLURM interactive job is run
without using LSF-HPC with SLURM batch processing features but is dispatched immediately
by LSF-HPC with SLURM on the LSF execution host.
See also LSF execution host.
MPI
Message Passing Interface. A library specification for message passing, proposed as a standard
by a broadly based committee of vendors, implementors, and users.
node allocation
Nodes allocated to the user by the SLURM, LSF-HPC, or RMS subsystem. Also referred to as
job allocation.
RMS
Resource Management System. A set of commands for running parallel programs and monitoring
their execution. The set includes utilities that determine what resources are available and
commands that request allocation of resources.
123
RPM
Red Hat Package Manager.
1. A utility that is used for software package management on a Linux operating system, most
notably to install and remove software packages.
2. A software package that is capable of being installed or removed with the RPM software
package management utility.
SLURM
124
Glossary
Simple Linux for Resource Management. A set of commands for system resource management
and job scheduling.
Index
A
active fraction, 114, 116
in HPCPI output, 44
AMD Opteron
branch statistics displayed by xperf, 100
CPU statistics displayed by xperf, 99
CPU utilization display, 83, 84
data cache statistics displayed by xperf, 99
DRAM statistics displayed by xperf, 100
DRAM utilization display, 83, 84
execution statistics displayed by xperf, 99
floating point operation statistics displayed by xperf,
99
HyperTransport link utilization display, 83, 84
HyperTransport statistics displayed by xperf, 100
instruction cache statistics displayed by xperf, 99
IPC statistics displayed by xperf, 99
local statistics displayed by xperf, 100
memory statistics displayed by xperf, 100
remote statistics displayed by xperf, 100
statistics displayed by xperf, 98
supported processors, 20
xclus display, 20, 83, 84
xclus statistics, 21
xperf display, 98
xperf statistics, 23
-and operator for HPCPI label selectors, 63
apmond daemon, 111
assembly code
in hpcpilist output, 47
B
bandwidth
displaying in xperf or xcxperf, 104
bandwidth rates
displaying with xclus or xcxclus, 86
branch
AMD Opteron statistics displayed by xperf, 100
Itanium statistics displayed by xperf, 97
branch events
event set for monitoring, 38
branch targets
viewing, 47
BranchEvents event set, 38
BranchEvents2 event set, 38
bsub command
using with HPCPI, 71
C
cache
data
event set for monitoring, 38
instruction
event set for monitoring, 38
Itanium statistics displayed by xperf, 96
cache misses
event set for monitoring, 38
CacheMissEvents event set, 38
clusmond daemon, 111
cluster file for xclus or xcxclus, 79
clusters
installing HPCPI and Xtools on, 27
using HPCPI on, 69
colors
changing for xperf and xcxperf graphs, 104
context switch
statistics displayed by xcxperf, 102
CPU
affinity requirement for measuring memory controller
and HyperTransport events, 57
AMD Opteron
utilization rate display, 83, 84
AMD Opteron statistics displayed by xperf, 99
Itanium
utilization rate display, 82
Itanium statistics displayed by xperf, 96
statistics displayed by xcxperf, 101
using to select HPCPI data, 62
utilization rate display, 85
-cpu selector for HPCPI labels, 62
-create-epoch option for hpcpid, 70
cumulative percentage
in HPCPI output, 45
cycles per instruction
displaying in xperf or xcxperf, 104
D
daemons
Xtools, 111
data cache
AMD Opteron statistics displayed by xperf, 99
event set for monitoring, 38
database
organizing HPCPI data with, 56
database directory
specifying alternate for HPCPI utilities, 51
structure, 113
-db option for HPCPI utilities, 51
DCacheEvents event set, 38
directory
binary
for HPCPI, 31, 35
for Xtools, 78
database
for HPCPI, 31, 36, 113
manpage
for HPCPI, 35
for Xtools, 78
/var/opt/xtools/log/db, 106
disk throughput
statistics displayed by xcxperf, 102
125
DISPLAY environment variable
setting for Xtools, 78
DMA bus
Itanium statistics displayed by xperf, 97
-doflush
option for hpcpid, 72
DRAM
AMD Opteron statistics displayed by xperf, 100
utilization display, 83, 84
duty group
overview, 19
recommendation, 56
E
elan (see Quadrics QsNet)
-enhanced option
for xcxclus or xcxperf, 110
epilog file for HPCPI, 72
epoch
comparison with label, 64
displaying, 41
organizing HPCPI data with, 56
specifying alternate for HPCPI utilities, 51
starting, 36, 41
terminating after creating, 70
using the existing, 70
-epoch option
for HPCPI utilities, 51
for hpcpid, 70
-equiv operator for HPCPI label selectors, 63
Ethernet throughput
statistics displayed by xcxperf, 102
event,
displaying valid names, 37
duty qualifier, 39
in HPCPI output, 44
interval, 38
event count
in HPCPI output, 44
event groups, 37
event interval
recommendation, 56
event sets
commonly used, 38
displaying valid, 37
-events option
for hpcpid, 37
execution
AMD Opteron statistics displayed by xperf, 99
F
file name format for HPCPI, 113
floating point
event set for monitoring, 38
floating point operation
AMD Opteron statistics displayed by xperf, 99
Itanium statistics displayed by xperf, 96
flushing HPCPI data, 32, 41
Found
126
Index
in HPCPI output, 46
FPC
statistics displayed by xperf, 96
FPUEvents event set, 38
front side bus (see FSB bus)
FSB bus
utilization rate display, 82
G
-generic option
for xclus or xperf, 109
golden image
using for installation, 28
grace delay parameter for node grouping, 93
grace parameter for node grouping, 93
graphs
selecting for xperf or xcxperf display, 104
-group-nodes option for xclus and xcxclus, 92
grouped nodes
in xclus or xcxclus, 92
modifying parameters for, 93
H
HelpMeEvents event set, 38
hiding values in xclus and xcxclus displays, 86
holding the xclus and xcxclus displays, 86
-hosts option for HPCPI utilities, 51, 70
selecting HPCPI data by, 70
-hosts option for HPCPI utilities, 51, 70
HP-LSF
using HPCPI with, 70
HPCEvents event set, 38
HPCPI
binary directory, 31, 35
clusters, using on, 69
components, 17
daemon, 32
starting, 36
stopping, 33
database
organizing data with, 56
synchronizing on cluster, 70
database directory, 37
determining, 41
environment variable, 31, 36
requirements, 36
setting the default location, 36
specifying for utilities, 51
description, 17
directories
binary , 31
database , 36
manpage , 35
displaying valid event names, 37
epoch (see epoch)
event groups, 37
file name format, 113
flushing data, 32, 41
label, 40
organizing data with, 56
loading the environment, 31, 35
log file, 37
manpage directory, 35
organizing data, 56
product limitations, 115
sampling characteristics, 18
selecting events for monitoring, 37
simple session, 31
stopping, 41
tips, 55
viewing instructions with highest counts, 49
viewing per-image statistics, 32, 44
viewing per-instruction statistics, 47
viewing per-line statistics, 33
viewing per-procedure statistics, 33, 46
HPCPI epoch (see epoch)
hpcpi RPM package, 26
hpcpicat utility, 18
output, 114
hpcpictl flush command, 41
hpcpictl label command, 66
(see also label)
basic syntax, 40, 59
nesting, 66
hpcpictl quit command, 41
hpcpictl show command, 41
hpcpid
-epoch option, 70, 72
-create-epoch option, 70
-doflush option, 72
-epoch option, 70, 72
-terminate-with option, 72
hpcpid daemon
starting, 32, 36
stopping, 33, 41
HPCPID environment variable, 36
hpcpilist utility
-db option, 51
disabling scientific notation, 53
-epoch option, 51
-event option, 52
header, 44, 54
-hosts option, 51, 70
HTML output, 54
-label option, 52
-no-header option, 54
-output-format option, 54
-raw-numbers option, 53
selecting data by system, 51
selecting labeled data, 52
-st option, 53
sorting output, 53
viewing per-instruction statistics, 47
viewing per-line statistics, 33
hpcpiprof utility
-db option, 51
disabling scientific notation, 53
-epoch option, 51
-event option, 52
header, 44, 54
-hosts option, 51, 70
HTML output, 54
-keep option, 54
-label option, 52
limiting output, 54
-no-header option, 54
-output-format option, 54
-raw-numbers option, 53
selecting data by system, 51
selecting labeled data, 52
-st option, 53
sorting output, 53
viewing per-image statistics, 32, 44
viewing per-procedure statistics, 33, 46
hpcpitopcounts utility
-db option, 51
disabling scientific notation, 53
-epoch option, 51
-event option, 52
header, 44, 54
-hosts option, 51, 70
HTML output, 54
-label option, 52
-no-header option, 54
-output-format option, 54
-raw-numbers option, 53
selecting data by system, 51
selecting labeled data, 52
-st option, 53
sorting output, 53
viewing instruction counts, 49
HTML output
for HPCPI utilities, 54
HyperTransport
AMD Opteron statistics displayed by xperf, 100
displaying data statistics with xclus or xcxclus, 86
utilization display, 83, 84
HyperTransport events
determining transmit and receive events, 57
requirements for accurate HPCPI measurements, 57
I
I/O bus
Itanium statistics displayed by xperf, 97
utilization rate display, 82, 85
IA64_INST_RETIRED
measuring, 57
ICacheEvents event set, 38
image
viewing HPCPI statistics for, 32, 44
image replication
using for installation, 28
Infiniband throughput
statistics displayed by xcxperf, 102
installing
on existing systems, 27
on HP XC clusters, 27
127
full imaging procedure, 28
manual propagation procedure, 28
running RPM on clients procedure, 29
on standalone systems, 27
requirements, 25
instruction
viewing HPCPI statistics for, 33, 47
instruction cache
AMD Opteron statistics displayed by xperf, 99
event set for monitoring, 38
instruction counts
viewing, 47, 49
instruction pointer
in hpcpilist output, 47
instructions per cycle
event set for, 38
Itanium, 57
interrupt
statistics displayed by xcxperf, 102
IPC
AMD Opteron statistics displayed by xperf, 99
Itanium statistics displayed by xperf, 96
IPCEvents event set, 38
Itanium
branch statistics displayed by xperf, 97
cache statistics displayed by xperf, 96
CPU statistics displayed by xperf, 96
CPU utilization rate display, 82
DMA bus statistics displayed by xperf, 97
floating point operation statistics displayed by xperf,
96
FSB bus utilization rate display, 82
I/O bus statistics displayed by xperf, 97
I/O bus utilization rate display, 82
instruction metrics, 57
IPC statistics displayed by xperf, 96
MID bus utilization rate display, 82
scoreboard statistics displayed by xperf, 96
statistics displayed by xperf, 96
supported processors, 20
system bus statistics displayed by xperf, 97
TLB statistics displayed by xperf, 96
xclus display, 81, 82
xclus statistics, 21
xperf display, 21
xperf statistics, 23
K
kernel idle process
selecting HPCPI data from, 66
L
label,
-and operator, 63
application arguments and , 65
basic syntax for creating, 40, 59
comparison with epoch, 64
controlling duration of, 62
environment variables and , 65, 67
128
Index
-equiv operator, 63
example session, 60
examples, 65–66
for specific code areas, 67
in program, 67
invoking in program, 67
kernel idle data and, 66
multiple, 64
negating, 63
-not operator, 63
operand processing, 63
-or operator, 63
organizing HPCPI data with, 56
overview, 59
reusing, 64
selecting processes by CPU, 62
selecting processes by parent PID, 62
selecting processes by PID, 62
selecting processes by process group ID, 62
selecting processes by session ID, 62
selecting processes by UID, 62
selectors, 62
specifying for HPCPI utilities, 52
starting from xperf, 106
labeling data, 40
layout
modifying for xclus or xcxclus, 86
legends
displaying for xperf or xcxperf graphs, 104
limiting hpcpiprof output, 54
line width
changing for xperf and xcxperf graphs, 104
Linux
supported versions, 25
loading
the HPCPI environment, 31
the Xtools environment, 78
local
AMD Opteron statistics displayed by xperf, 100
LSF
using HPCPI with, 70
Lustre throughput
statistics displayed by xcxperf, 102
M
memory
AMD Opteron statistics displayed by xperf, 100
statistics displayed by xcxperf, 103
vmalloc() statistics displayed by xcxperf, 103
memory controller events
requirements for accurate HPCPI measurements, 57
memory interface data bus (see MID bus)
menu options
for xclus, 87
for xcxclus, 87
MID bus
utilization rate display, 82
module load
for HPCPI, 31, 35
for Xtools, 78
mond daemon, 111
MPI
and HPCPI labels, 73
using HPCPI with, 70
mpirun
and HPCPI labels, 73
using HPCPI labels with, 69
N
negating HPCPI label selectors, 63
NFS
statistics displayed by xcxperf, 102
-no-group-nodes option for xclus and xcxclus, 92
-node-grouping-threshold option for xclus and xcxclus,
92
-nodes option for xclus and xcxclus, 79
NOP_RETIRED
measuring, 57
-not operator for HPCPI label selectors, 63
not found
in HPCPI output, 46
O
Opteron (see AMD Opteron)
-or operator for HPCPI label selectors, 63
-output option
for xclus and xcxclus, 89
for xperf and xcxperf, 107
P
parent PID
using to select HPCPI data, 62
Period
in HPCPI output, 44
-pgid selector for HPCPI labels, 62
using for utilities that spawn processes, 65
PID
parent
using to select HPCPI data, 62
using to select HPCPI data, 62
-pid selector for HPCPI labels, 62
using for existing process, 65
using to select all processes, 65
-plot option
for xclus and xcxclus, 89
for xperf and xcxperf, 107
plotting
xclus or xcxclus data, 89
-ppid selector for HPCPI labels, 62
PREDICATE_SQUASHED_RETIRED
measuring, 57
procedure
viewing HPCPI statistics for, 33, 46
process group ID
using to select HPCPI data, 62
process ID (see PID)
processors
supported, 20
prolog file for HPCPI, 71
Q
Quadrics QsNet
statistics displayed by xcxperf, 103
-quiet option for xclus and xcxclus, 89
R
raw numbers
viewing in HPCPI output, 53
refresh rate
changing for xclus or xcxclus, 86
register stack engine (see RSE)
remote
AMD Opteron statistics displayed by xperf, 100
removing software, 30
replaying
xclus or xcxclus data, 89
RMS
using HPCPI with, 70
-row-width option for xclus and xcxclus, 87
RPM packages, 25
RSE
Itanium statistics displayed by xperf, 96
S
Samples
in HPCPI output, 44
saving
xclus or xcxclus data, 89
/sbin/start
starting the HPCPI subsystem with, 29
scientific notation
disabling in HPCPI output, 53
scoreboard
Itanium statistics displayed by xperf, 96
ServerEvents event set, 38
session ID
using to select HPCPI data, 62
shared images
using labels with, 52, 61
-sid selector for HPCPI labels, 62
SLURM
epilog file for HPCPI, 72
prolog file for HPCPI, 71
using HPCPI with, 70
socket
statistics displayed by xcxperf, 102
sorting output for HPCPI utilities, 53
-st option for HPCPI utilities, 53
StallEvents event set, 38
stalls
event set for monitoring, 38
starting
epoch, 36, 41
HPCPI, 36
HPCPI subsystem, 29
hpcpid daemon, 36
xclus, 78
129
xcxclus, 78
xperf or xcxperf, 94
Xtools subsystem, 29
status
HPCPI, 41
stopping
hpcpid, 41
subsume delay parameter for node grouping, 93
supermon daemon, 111
suppressing output for xclus or xcxclus, 89
suspending the xclus and xcxclus displays, 86
swap
statistics displayed by xcxperf, 103
system bus
Itanium statistics displayed by xperf, 97
system information
displaying with xperf or xcxperf, 108
T
TCP
statistics displayed by xcxperf, 102
-terminate-with
option for hpcpid, 72
tips
HPCPI, 55
TLB
Itanium statistics displayed by xperf, 96
tolerance parameter for node grouping, 93
translation lookaside buffer (see TLB)
U
UDP
statistics displayed by xcxperf, 102
UID
using to select HPCPI data, 62
-uid selector for HPCPI labels, 62
unknown_rou
in HPCPI output, 46, 116
user ID (see UID)
V
/var/opt/xtools/log/db, 106
verifying installation, 30
viewing instruction counts, 49
viewing per-image statistics, 44
viewing per-instruction statistics, 47
viewing per-procdure statistics, 46
vmalloc()
statistics displayed by xcxperf, 103
X
xclus utility
AMD Opteron display, 83, 84
cluster file, 79
-cluster option, 80
comparison with xcxclus, 76
default cluster file, 80
display for AMD Opteron, 20
displaying bandwidth rates, 86
130
Index
displaying HyperTransport data statistics, 86
-generic option, 109
-group-nodes option, 92
grouped nodes, 92
modifying parameters for, 93
hiding values in display, 86
Itanium display, 82
menu options, 87
-no-group-nodes option, 92
-node-grouping-threshold option, 92
-nodes option, 79
-output option, 89
overview, 76
-plot option, 89
plotting data, 89
processors, 20
-quiet option, 89
refresh rate, 86
replaying data, 89
-row-width option, 87
saving data to file, 89
showing statistic names, 86
size, changing, 86
specifying nodes for, 79
starting, 78–80
statistics, 21
suspending the display, 86
-zoom option, 86
xcxclus utility
cluster file, 79
-cluster option, 80
comparison with xclus, 76
CPU utilization display, 85
data file, 89
displaying bandwidth rates, 86
displaying HyperTransport data statistics, 86
-enhanced option, 110
generic node display, 85
-group-nodes option, 92
grouped nodes, 92
modifying parameters for, 93
hiding values in display, 86
I/O bus utilization display, 85
menu options, 87
-no-group-nodes option, 92
-node-grouping-threshold option, 92
-nodes option, 79
-output option, 89
overview, 76
-plot option, 89
plotting data, 89
processors, 20
-quiet option, 89
refresh rate, 86
replaying data, 89
-row-width option, 87
saving data to file, 89
showing statistic names, 86
size, changing, 86
specifying nodes for, 79
starting,
statistics, 21
suspending the display, 86
-zoom option, 86
xcxperf utility
bandwidth, displaying, 104
comparison with xperf, 94
-enhanced option, 110
graphs, selecting for display, 104
legends, 104
menu options, 105
-output option, 107
overview, 76, 94
-plot option, 107
processors, 20
starting, 94
starting from xcxclus, 92
statistics, 23
xinetd
and apmond and clusmond, 111
restarting, 29
xperf display
for AMD Opteron, 98
for Itanium, 21
xperf utility
AMD Opteron statistics, 98
bandwidth, displaying, 104
comparison with xcxperf, 94
-generic option, 109
graphs, selecting for display, 104
Itanium statistics, 96
legends, 104
menu options, 104
-output option, 107
overview, 76, 94
-plot option, 107
processors, 20
starting, 94
starting from xclus, 92
statistics, 23
Xtools, 20
(see also xclus utility, xcxclus utility, xperf, and xcxperf)
binary directory, 78
xtools-clients RPM package, 26
xtools-common RPM package, 26
xtools-xc_clients RPM package, 26
Z
-zoom option for xclus and xcxclus, 86
131