Guests` usage of TSC Deadline Timer: Architectural

Intel® PerfMon
Performance Monitoring Hardware
Overview
Software & Services Group
PerfMon Basics
•
PerfMon is hardware throughout the silicon available through registers to
tools to facilitate several system/application usages:
– compiler analysis
– workload characterization
– performance tuning and debug
•
Two counting paradigms used
– Global/absolute count
– Sampling on overflow
•
Previous incompatibility among processor families has limited its value
– First introduced on the Pentium processor
– Model-specific changes added to P6, Pentium 4, and Xeon processors
•
Intel® Core™ Solo processors were the first to support Architectural PerfMon
– Arch PerfMon ensures all base analysis capabilities across all Intel processors
Software & Services Group
Performance Tuning Methodology
• Top Down Iterative Approach is used at every ‘level’
System Level
Top
Down
•
•
•
•
Process/Memory Utilization
Network
Disk & IO
Context Switches/Paging/System Calls
Application Level
•
•
•
•
vmstat
NetMon
MS
PerfMon
APImon
Locks
Heaps
Parallelism / Execution Threads
APIs
Quantify
VTune
PTU
Microarchitecture Level (PerfMon)
•
•
•
•
Processor Stalls
Branch Prediction
Data and code alignment
Glass Jaws
xIF
*Many customer &
open source tools
PerfMon is hardware behind all
microarchitecture level tuning/analysis
Software & Services Group
Intel® Tools
• PerfMon based analysis tools methodology:
prototype methodologies are released in free
analysis tools followed by full production support
in releases of Intel® Vtune™ Analyzer!
• Prototype tools available @ whatif.intel.com
– Available today PTU (Performance Tuning Utility)
• Much of the All New VTune™ Analyzer is based on PTU.
– xIF (platform x Issue Finder – microarchitecture focus)
• Beta is due out next week.
• Production is due early July internally.
• Production externally is due in September.
Software & Services Group
x Issue Finder (xIF)
Analyzer for Microarchitecture Tuning
• xIF breaks down code into streams
• Each stream is analyzed against ‘best case’ performance
using Intel preset platform ‘Issue List’ to prioritize
tuning efforts to find:
– Locks & Waits – Where is my program waiting, over using
system calls?
– Front-End Issues – Find the compiler opportunities
– Cache analysis & false sharing
– Memory/Data Stalling
– Branching issues
– More…Code/Data Layout, …
Software & Services Group
5
Market Leading Software Tools
Optimize Multicore Performance
Focus tool utilizing PerfMon Hardware
Software & Services Group
Intel’s Piersol HE Profiler
All New VTune™ Analyzer
• New User Interface
• Next Generation
Collection Technology
• Integrated Threading
Timeline
Product
Features
Intel’s HE Studio
Static Security Analysis,
Dynamic Memory and Thread
Analysis
Supported
Environments
C/C++, Fortran and .NET
Windows, Linux,
Microsoft Visual Studio
2005, 2008, 2010
Availability
Q4/2010
Software & Services Group
Intel’s Piersol HE Profiler
Performance Analysis System
• Windows* & Linux*
• GUI, Command line, Visual Studio
integration
• No special compilers or recompile
Setup
Collect
Analyze
• Presets Simplify
Setup
• Multiple Data Collectors
• Multiple Views Simplify
Analysis
Software & Services Group
Intel’s Piersol HE Profiler
Preset Experiments Simplify Setup
Basic
Presets
Run these first
Advanced
Presets
Get more detail
Custom
Full Control
• Hotspot – Where is my app spending time?
• Concurrency – Where is my concurrency poor?
• Locks & Waits – Where is my program waiting?
•
•
•
•
Tuning Decision Tree – Find the opportunities
Cache analysis & false sharing
Branching issues
More… Structural hazards, …
• All events – Detailed micro-architectural analysis
• All parameters
Software & Services Group
Intel’s Piersol HE Profiler
Multiple Views Simplify Analysis
Grid View
Sort Tabular Data
Timeline View
Thread Transitions &
CPU Utilization
• Display call tree
and event data
• Color coded CPU
utilization
• See transitions
• See utilization
• Visualize Frames
Source/Asm View
See Data on Source
• Line by line data
• ASM & source
side by side
• Basic blocks
Cross Selection &
Filters
Quickly find the answer
you need
•
•
•
•
Double click in grid to open source / asm
Select in timeline to highlight functions in grid
Select in source to highlight asm basic blocks
Filter by module / process / function / frame
Software & Services Group
Intel’s Piersol HE Profiler
All New VTune™ Performance Analyzer
What’s New
So What?
Presets for performance experiments
Much easier setup
Statistical call tree
Faster startup and faster run
Threading timeline
Visualize thread transitions, PMU data,…
Cross-select and filter results
Find the data you need faster
Attach to a running process on
Windows
Profile your app without restarting
Better source/asm view w/ heat map
Easier navigation, “skid” compensation
Frame based analysis
Games & graphics – find the slow frame
Transactions – find slow transaction
Event multiplexing
More data with fewer runs
Uncore support
Processor wide metrics (e.g.,
bandwidth)
Linux* – no EntireX*, no Eclipse* required
Fewer conflicts. Lighter footprint.
Drivers not required unless using EBS
Root only required for driver install
Smart support for TBB & OpenMP*
Display user names / source constructs
Software & Services Group
Compiler Professional Ed. 12.0 Preview
• Support for new and emerging OSes:
– Red Hat* EL 6
– Ubuntu* 10
• Support for new IDE versions, e.g.
– Microsoft Visual Studio* 2010
– Eclipse* CDT 6.0 (C++ only)
• Support for new processors, e.g.
– Sandy Bridge (enhanced AVX support)
• Timeline
– Beta targeted for late May or June 2010
– Product release targeted for Q4 2010
– Request beta access from [email protected]
Product names, features, and target dates subject to change
Software & Services Group
Intel’s C++ and Fortran Compilers
Performance Compilers
What’s New
So What?
Co‐Array Fortran
Data decomposition in Fortran syntax
Fortran 2003 and 2008 support
C++ 0x and C99 new features
Support of the latest Fortran and C++ standards continues
Latest processor support ( string instructions Get the best performance on the latest processors
for SSE4.2 and AVX support)
Cilk
Simple keywords for task parallelism
New array syntax for C/C++
More readable/optimizable array operations
Vectorization and auto‐parallelization
improvements Compiler can vectorize and parallelize more often giving increased performance
Loop profile
Profile at the loop (not just function level)
Statement specific inlining pragmas
More control of inlining
Improved compile time
Lower cost of development
Matrix multiply call to MKL
Faster matrix multiplies without code change
Software & Services Group
Compiler Pro 12.0 New Fortran Features
•
Support for Co-Array Fortran
–
–
–
–
•
Other Fortran 2008 features
–
–
–
–
–
•
Shared memory support in Compiler Professional Edition
Distributed & shared memory support in Cluster Tool suite
Uses Intel® MPI technology
Can’t mix with OpenMP or explicit MPI calls
DO CONCURRENT
CONTIGUOUS
I/O enhancements
New constants in ISO_FORTRAN_ENV
New intrinsic functions
Fortran 2003 Support
– Complete type-bound procedures (GENERIC, OPERATOR,..)
– FINALization
Product names, features, and target dates subject to change
Software & Services Group
Compiler Pro 12.0 – New C++ Features
•
More C++0x and C99 features
– E.g. rvalue references
• reduce temporary copies
– Maintain Microsoft Visual Studio* compatibility
•
Optimized string intrinsics
– Using SSE4.2 instructions
•
New array syntax for C/C++
– More readable; Helps the compiler vectorize and parallelize
– Somewhat similar to Fortran90 concept
• <array base> [ <lower bound> : <length> [ : <stride>] ]
• E.g. a[0:s] += b[2:s:2]
•
Incorporation of Cilk technology
– Simple keywords, especially for task parallelism
– “Hyperobjects” to facilitate thread-safe access
• Provide thread safe reduction operations
Product names, features, and target dates subject to change
15
Software & Services Group
Compiler Pro 12.0 - Cilk Integration
Feature
Example
Semantics
Spawning a
function call
x = cilk_spawn func(y);
func executes
asynchronously
Synchronization
statement
cilk_sync;
Wait for all children
spawned inside the
current function
Parallel for loop
cilk_for
for (int i = 0; i < N; i++) {
statement; }
Loop iterations
execute in parallel.
Hyperobjects
Product names, features,
and target dates
subject to change
Allow parallelization of
reduction operation
with minimal source
changes
* Disclaimer: Keyword spelling not finalized.
Cilk semantics are simple and powerfulSoftware & Services Group
16
Compiler Pro 12.0 – New Optimization
Features
•
Vectorization improvements, e.g.
– Loops with mixed data types
– Enhanced AVX support
– “Vectorize or fail” pragmas (pragma SIMD)
•
Improved auto-parallelization
– Enhanced privatization analysis
– Declare functions whose calls can safely be parallelized
– Guided Auto-Parallelization
(Advice for vectorization, parallelization and data transformation)
•
•
Matrix multiply intrinsics may call into MKL
More loop multi-versioning
– auto-parallelization
– memcpy generation
Product names, features, and target dates subject to change
17
Software & Services Group
Compiler Pro 12.0 – New Optimization
Features
• Loop profile option
– Loop level profile data
– In addition to function level profile data already available
– –profile-loop-report=2
• Statement-specific inlining pragmas
– #pragma forceinline
– #pragma inline
– #pragma noinline
Software & Services Group
18
Compiler Pro 12.0 – Other Features
• Security enhancements (Correctness HE plus compiler)
– Improved source checking, static memory checker
– Integrated with runtime checker and GUI in HE Studio (required)
• Math library enhancements
– Optional library for fast math functions, but at lower accuracy
– Optional library for consistent results on different processors
• Debugger enhancements
– STL object visualization
– AVX disassembly and register display
• Improved compile time
Software & Services Group
19
Compiler Pro 12.0 – Other Features
•
Debugger enhancements
– STL object visualization (including TBB debug support)
– AVX disassembly and register display
•
Standardized new directory structure
– A generic, default fixed path to current compiler and libraries
Example, Linux*: /opt/intel/compilerpro/bin
No need to modify makefiles or build scripts with new versions
Specific versions still installed side-by-side
Generic paths use symbolic links to specific version
Default is the last compiler installed (assumes versions are installed in
order of release)
• Version-specific paths still available for explicit control of version
•
•
•
•
•
– Performance libraries follow similar scheme:
• Example, Linux*: /opt/intel/mkl/lib/intel64
Software & Services Group
20
Intel® MKL Domains and Parallelism
Where’s the Parallelism?
Domain
SIMD
Open MP
BLAS 1, 2, 3
X
X
FFTs
X
X
LAPACK
X
X
(dense LA solvers)
(relies on BLAS 3)
PARDISO (sparse solver)
VML/VSL
X
X
X
X
ScaLAPACK
(cluster dense LA solvers)
X
(hybrid)
Cluster FFT
Summary Statistics
MPI
X
X
X
Software & Services Group
21
Major New Features for Intel® MKL in 2010
• Extended AVX/Sandy Bridge support
– Many optimized kernels
• More C/C++ support
– CLAPACK with row major matrices
• Dynamic accuracy control for VML
• Added C-style 0-based index arrays in PARDISO
• New symmetric matrix-vector product BLAS routine in
blocked storage
• Added Split Complex (real ) support for 2D/3D FFTs
– See http://software.intel.com/en-us/whatif/
Product names, features, and target dates subject to change
22
Software & Services Group
Major New Features for Intel® MKL in 2010
cont.
• New SFMT19937 Random Number Generator with >=2x
gain over current implementation (MT19937)
• Added ability for users to build custom dynamic libraries
from MKL dynamic libraries
• Summary Statistics Library
– threaded analysis of multi-dimensional data sets
• Quantiles, moments, correlations
Software & Services Group
23
New Features for Intel® IPP in 2010
• Optimized for new processors
– Westmere and new AES instructions
– Sandy Bridge and AVX
• DMIP infrastructure and DMIP sample for 2D image
processing
– CPU and multi-core
• JPEG-XR CODEC
• JPEG Sample Productization
– enhanced tests, testing infrastructure
Software & Services Group
24
New Features for Intel® IPP in 2010
• Windows Imaging Component (WIC) API wrapper for
Image Codecs
• Data Compression Library support (bzip2, zlib, gzip)
• C String library (limited scope)
• DMIP DSL for Tighter Visual Studio integration
• Reduced IPP package size
– Deprecated obsolete CPU optimizations; Smart dispatcher
• Improved documentation
Software & Services Group
25
Intel® Correctness and Performance
Tools Beta
Software & Services Group
Correctness and Performance Tools
•
Intel Cantua HE Checker
– Dynamic and static checking for memory and threading errors
•
Intel Piersol HE Profiler
– Next generation performance analyzer
• Serial and parallel profilers
•
Major changes to infrastructure of our analysis products
– GUI improvements
• Activity selection more in line with workflow
• More intuitive information displays
– Improved technology
• PIN replaces bistro
– Friendlier environment for network installs and execution
– Improved filtering/sorting utilities
Software & Services Group
Software Development Challenges
Problem Statement
•
Size and complexity of applications are greater,
organizations facing higher application defects,
vulnerabilities and costs
Need:
•
•
•
Reduce time, effort, and costs required to find and
repair coding defects and vulnerabilities, prior to
deploying software
Increase developer productivity, reduce maintenance,
and improve software quality
Effective analysis tools to maximize code quality,
reliability, and security, while minimizing cost and time
Cost Factors – Square Project Analysis:
•
•
Reworking defects 40%-50% of total project effort*
Using Software Automated Quality tools in
development cycle, increases ROI by 12%-21%*
*CERT: U.S. Computer Emergency Readiness Team, and Carnegie Melon CyLab *NIST: National Institute of Standards & Technology : Square Project Results
Software & Services Group
Solution
• Enterprise-class software analysis tools that
can effectively find crucial code defects during
the development phase
– Pinpoints critical memory and threading errors
– Enhances efficiencies, code reliability, quality, and
security
– Improves developer productivity during the
development and quality assurance phases,
– Reduced maintenance time, and increased ROI
Software & Services Group
Introducing Intel’s Cantua HE Checker
Memory and Thread Analysis Solution
Ensures Code Quality and Reliability
Value At-A-Glance
Enterprise-class • Built-in Dynamic tools for Serial and Parallel code
solution
Intel’s
Cantua HE
Checker
Standalone
Correctness
Tool
Dynamic • Detects Memory Leaks and Memory corruption
Memory • Invalid Memory Accesses
Analysis • Invalid Partial Memory Accesses
• Mismatched Memory Allocation/Deallocation
• Missing Allocations
• Uninitialized Memory Accesses & Partial Memory
Accesses
Dynamic Thread • Detects Deadlocks and Data Races
Analysis
Productivity • Intuitive GUI and CLI interface for a wide range of
Tools functions
• Dynamic Instrumentation
• Detailed error data analysis
• Time line visualization
Beta starts week of April 5, 2010
Software & Services Group
Introducing Intel’s HE Studio
Minimizes risk, Maximizes Value
Ensures Code Security, Reliability, Quality, Lowers TCO
Value At‐A‐Glance
Enterprise-class Code
analysis solution
Intel’s
HE Studio
Dynamic Memory
Analysis
Correctness Tool
Dynamic Thread
Analysis
Static Security
Analysis
Detects wide range
(>250) of errors and
software security
vulnerabilities
Productivity Tools
• Built-in Static and Dynamic tools for Serial and Parallel code
• Detects Memory Leaks and Memory corruption
• Invalid Memory Accesses
• Invalid Partial Memory Accesses
• Mismatched Memory Allocation/Deallocation
• Missing Allocations
• Uninitialized Memory Accesses & Partial Memory Accesses
• Detects Deadlocks and Data Races
• Buffer overruns
• Uninitialized variables and Bad pointers
• Unsafe library usage
• Arithmetic overflow
• OpenMP* errors, unchecked input, heap corruption, error prone
usage
• Intuitive GUI and CLI interface for a wide range of functions
• Dynamic Instrumentation
• Detailed error data analysis and Time line visualization
Software & Services Group
Intel Correctness Tools
Product Family Positioning
Intel® Thread Checker
Enterprise/HPC
• Data race and
deadlock detection
• C/C++ & Fortran
• Microsoft Visual
Studio 2003, 2005,
2008
• Windows XP, Vista
• Linux
Intel® Parallel
Inspector
Intel’s Cantua HE
Intel’s HE
Checker
Studio
Mainstream
Enterprise/HPC
Enterprise/HPC
• Memory Analysis leaks and corruption
• Thread Analysis data races, and
deadlocks
• C/C++ Microsoft
Visual Studio 2005,
2008, (next major
release 2010)
• Memory analysis leaks and corruption
• Thread analysis data races, and
deadlocks
• C/C++, Fortran and
.NET
• Windows, Linux,
Microsoft Visual
Studio 2005, 2008,
2010
• Standalone package
• Static Security
analysis
• Memory analysis leaks and corruption
• Thread analysis data races, and
deadlocks
• C/C++, Fortran and
.NET
• Windows, Linux,
Microsoft Visual
Studio 2005, 2008,
2010
Software & Services Group
Legal Disclaimer
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR
OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES
NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS
INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE,
MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY
RIGHT.
Performance tests and ratings are measured using specific computer systems and/or components and
reflect the approximate performance of Intel products as measured by those tests. Any difference in system
hardware or software design or configuration may affect actual performance. Buyers should consult other
sources of information to evaluate the performance of systems or components they are considering
purchasing. For more information on performance tests and on the performance of Intel products, reference
www.intel.com/software/products.
Intel® and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2010. Intel Corporation.
http://www.intel.com/software/products
Software & Services Group
33
Software & Services Group