Intel® PerfMon Performance Monitoring Hardware Overview Software & Services Group PerfMon Basics • PerfMon is hardware throughout the silicon available through registers to tools to facilitate several system/application usages: – compiler analysis – workload characterization – performance tuning and debug • Two counting paradigms used – Global/absolute count – Sampling on overflow • Previous incompatibility among processor families has limited its value – First introduced on the Pentium processor – Model-specific changes added to P6, Pentium 4, and Xeon processors • Intel® Core™ Solo processors were the first to support Architectural PerfMon – Arch PerfMon ensures all base analysis capabilities across all Intel processors Software & Services Group Performance Tuning Methodology • Top Down Iterative Approach is used at every ‘level’ System Level Top Down • • • • Process/Memory Utilization Network Disk & IO Context Switches/Paging/System Calls Application Level • • • • vmstat NetMon MS PerfMon APImon Locks Heaps Parallelism / Execution Threads APIs Quantify VTune PTU Microarchitecture Level (PerfMon) • • • • Processor Stalls Branch Prediction Data and code alignment Glass Jaws xIF *Many customer & open source tools PerfMon is hardware behind all microarchitecture level tuning/analysis Software & Services Group Intel® Tools • PerfMon based analysis tools methodology: prototype methodologies are released in free analysis tools followed by full production support in releases of Intel® Vtune™ Analyzer! • Prototype tools available @ whatif.intel.com – Available today PTU (Performance Tuning Utility) • Much of the All New VTune™ Analyzer is based on PTU. – xIF (platform x Issue Finder – microarchitecture focus) • Beta is due out next week. • Production is due early July internally. • Production externally is due in September. Software & Services Group x Issue Finder (xIF) Analyzer for Microarchitecture Tuning • xIF breaks down code into streams • Each stream is analyzed against ‘best case’ performance using Intel preset platform ‘Issue List’ to prioritize tuning efforts to find: – Locks & Waits – Where is my program waiting, over using system calls? – Front-End Issues – Find the compiler opportunities – Cache analysis & false sharing – Memory/Data Stalling – Branching issues – More…Code/Data Layout, … Software & Services Group 5 Market Leading Software Tools Optimize Multicore Performance Focus tool utilizing PerfMon Hardware Software & Services Group Intel’s Piersol HE Profiler All New VTune™ Analyzer • New User Interface • Next Generation Collection Technology • Integrated Threading Timeline Product Features Intel’s HE Studio Static Security Analysis, Dynamic Memory and Thread Analysis Supported Environments C/C++, Fortran and .NET Windows, Linux, Microsoft Visual Studio 2005, 2008, 2010 Availability Q4/2010 Software & Services Group Intel’s Piersol HE Profiler Performance Analysis System • Windows* & Linux* • GUI, Command line, Visual Studio integration • No special compilers or recompile Setup Collect Analyze • Presets Simplify Setup • Multiple Data Collectors • Multiple Views Simplify Analysis Software & Services Group Intel’s Piersol HE Profiler Preset Experiments Simplify Setup Basic Presets Run these first Advanced Presets Get more detail Custom Full Control • Hotspot – Where is my app spending time? • Concurrency – Where is my concurrency poor? • Locks & Waits – Where is my program waiting? • • • • Tuning Decision Tree – Find the opportunities Cache analysis & false sharing Branching issues More… Structural hazards, … • All events – Detailed micro-architectural analysis • All parameters Software & Services Group Intel’s Piersol HE Profiler Multiple Views Simplify Analysis Grid View Sort Tabular Data Timeline View Thread Transitions & CPU Utilization • Display call tree and event data • Color coded CPU utilization • See transitions • See utilization • Visualize Frames Source/Asm View See Data on Source • Line by line data • ASM & source side by side • Basic blocks Cross Selection & Filters Quickly find the answer you need • • • • Double click in grid to open source / asm Select in timeline to highlight functions in grid Select in source to highlight asm basic blocks Filter by module / process / function / frame Software & Services Group Intel’s Piersol HE Profiler All New VTune™ Performance Analyzer What’s New So What? Presets for performance experiments Much easier setup Statistical call tree Faster startup and faster run Threading timeline Visualize thread transitions, PMU data,… Cross-select and filter results Find the data you need faster Attach to a running process on Windows Profile your app without restarting Better source/asm view w/ heat map Easier navigation, “skid” compensation Frame based analysis Games & graphics – find the slow frame Transactions – find slow transaction Event multiplexing More data with fewer runs Uncore support Processor wide metrics (e.g., bandwidth) Linux* – no EntireX*, no Eclipse* required Fewer conflicts. Lighter footprint. Drivers not required unless using EBS Root only required for driver install Smart support for TBB & OpenMP* Display user names / source constructs Software & Services Group Compiler Professional Ed. 12.0 Preview • Support for new and emerging OSes: – Red Hat* EL 6 – Ubuntu* 10 • Support for new IDE versions, e.g. – Microsoft Visual Studio* 2010 – Eclipse* CDT 6.0 (C++ only) • Support for new processors, e.g. – Sandy Bridge (enhanced AVX support) • Timeline – Beta targeted for late May or June 2010 – Product release targeted for Q4 2010 – Request beta access from [email protected] Product names, features, and target dates subject to change Software & Services Group Intel’s C++ and Fortran Compilers Performance Compilers What’s New So What? Co‐Array Fortran Data decomposition in Fortran syntax Fortran 2003 and 2008 support C++ 0x and C99 new features Support of the latest Fortran and C++ standards continues Latest processor support ( string instructions Get the best performance on the latest processors for SSE4.2 and AVX support) Cilk Simple keywords for task parallelism New array syntax for C/C++ More readable/optimizable array operations Vectorization and auto‐parallelization improvements Compiler can vectorize and parallelize more often giving increased performance Loop profile Profile at the loop (not just function level) Statement specific inlining pragmas More control of inlining Improved compile time Lower cost of development Matrix multiply call to MKL Faster matrix multiplies without code change Software & Services Group Compiler Pro 12.0 New Fortran Features • Support for Co-Array Fortran – – – – • Other Fortran 2008 features – – – – – • Shared memory support in Compiler Professional Edition Distributed & shared memory support in Cluster Tool suite Uses Intel® MPI technology Can’t mix with OpenMP or explicit MPI calls DO CONCURRENT CONTIGUOUS I/O enhancements New constants in ISO_FORTRAN_ENV New intrinsic functions Fortran 2003 Support – Complete type-bound procedures (GENERIC, OPERATOR,..) – FINALization Product names, features, and target dates subject to change Software & Services Group Compiler Pro 12.0 – New C++ Features • More C++0x and C99 features – E.g. rvalue references • reduce temporary copies – Maintain Microsoft Visual Studio* compatibility • Optimized string intrinsics – Using SSE4.2 instructions • New array syntax for C/C++ – More readable; Helps the compiler vectorize and parallelize – Somewhat similar to Fortran90 concept • <array base> [ <lower bound> : <length> [ : <stride>] ] • E.g. a[0:s] += b[2:s:2] • Incorporation of Cilk technology – Simple keywords, especially for task parallelism – “Hyperobjects” to facilitate thread-safe access • Provide thread safe reduction operations Product names, features, and target dates subject to change 15 Software & Services Group Compiler Pro 12.0 - Cilk Integration Feature Example Semantics Spawning a function call x = cilk_spawn func(y); func executes asynchronously Synchronization statement cilk_sync; Wait for all children spawned inside the current function Parallel for loop cilk_for for (int i = 0; i < N; i++) { statement; } Loop iterations execute in parallel. Hyperobjects Product names, features, and target dates subject to change Allow parallelization of reduction operation with minimal source changes * Disclaimer: Keyword spelling not finalized. Cilk semantics are simple and powerfulSoftware & Services Group 16 Compiler Pro 12.0 – New Optimization Features • Vectorization improvements, e.g. – Loops with mixed data types – Enhanced AVX support – “Vectorize or fail” pragmas (pragma SIMD) • Improved auto-parallelization – Enhanced privatization analysis – Declare functions whose calls can safely be parallelized – Guided Auto-Parallelization (Advice for vectorization, parallelization and data transformation) • • Matrix multiply intrinsics may call into MKL More loop multi-versioning – auto-parallelization – memcpy generation Product names, features, and target dates subject to change 17 Software & Services Group Compiler Pro 12.0 – New Optimization Features • Loop profile option – Loop level profile data – In addition to function level profile data already available – –profile-loop-report=2 • Statement-specific inlining pragmas – #pragma forceinline – #pragma inline – #pragma noinline Software & Services Group 18 Compiler Pro 12.0 – Other Features • Security enhancements (Correctness HE plus compiler) – Improved source checking, static memory checker – Integrated with runtime checker and GUI in HE Studio (required) • Math library enhancements – Optional library for fast math functions, but at lower accuracy – Optional library for consistent results on different processors • Debugger enhancements – STL object visualization – AVX disassembly and register display • Improved compile time Software & Services Group 19 Compiler Pro 12.0 – Other Features • Debugger enhancements – STL object visualization (including TBB debug support) – AVX disassembly and register display • Standardized new directory structure – A generic, default fixed path to current compiler and libraries Example, Linux*: /opt/intel/compilerpro/bin No need to modify makefiles or build scripts with new versions Specific versions still installed side-by-side Generic paths use symbolic links to specific version Default is the last compiler installed (assumes versions are installed in order of release) • Version-specific paths still available for explicit control of version • • • • • – Performance libraries follow similar scheme: • Example, Linux*: /opt/intel/mkl/lib/intel64 Software & Services Group 20 Intel® MKL Domains and Parallelism Where’s the Parallelism? Domain SIMD Open MP BLAS 1, 2, 3 X X FFTs X X LAPACK X X (dense LA solvers) (relies on BLAS 3) PARDISO (sparse solver) VML/VSL X X X X ScaLAPACK (cluster dense LA solvers) X (hybrid) Cluster FFT Summary Statistics MPI X X X Software & Services Group 21 Major New Features for Intel® MKL in 2010 • Extended AVX/Sandy Bridge support – Many optimized kernels • More C/C++ support – CLAPACK with row major matrices • Dynamic accuracy control for VML • Added C-style 0-based index arrays in PARDISO • New symmetric matrix-vector product BLAS routine in blocked storage • Added Split Complex (real ) support for 2D/3D FFTs – See http://software.intel.com/en-us/whatif/ Product names, features, and target dates subject to change 22 Software & Services Group Major New Features for Intel® MKL in 2010 cont. • New SFMT19937 Random Number Generator with >=2x gain over current implementation (MT19937) • Added ability for users to build custom dynamic libraries from MKL dynamic libraries • Summary Statistics Library – threaded analysis of multi-dimensional data sets • Quantiles, moments, correlations Software & Services Group 23 New Features for Intel® IPP in 2010 • Optimized for new processors – Westmere and new AES instructions – Sandy Bridge and AVX • DMIP infrastructure and DMIP sample for 2D image processing – CPU and multi-core • JPEG-XR CODEC • JPEG Sample Productization – enhanced tests, testing infrastructure Software & Services Group 24 New Features for Intel® IPP in 2010 • Windows Imaging Component (WIC) API wrapper for Image Codecs • Data Compression Library support (bzip2, zlib, gzip) • C String library (limited scope) • DMIP DSL for Tighter Visual Studio integration • Reduced IPP package size – Deprecated obsolete CPU optimizations; Smart dispatcher • Improved documentation Software & Services Group 25 Intel® Correctness and Performance Tools Beta Software & Services Group Correctness and Performance Tools • Intel Cantua HE Checker – Dynamic and static checking for memory and threading errors • Intel Piersol HE Profiler – Next generation performance analyzer • Serial and parallel profilers • Major changes to infrastructure of our analysis products – GUI improvements • Activity selection more in line with workflow • More intuitive information displays – Improved technology • PIN replaces bistro – Friendlier environment for network installs and execution – Improved filtering/sorting utilities Software & Services Group Software Development Challenges Problem Statement • Size and complexity of applications are greater, organizations facing higher application defects, vulnerabilities and costs Need: • • • Reduce time, effort, and costs required to find and repair coding defects and vulnerabilities, prior to deploying software Increase developer productivity, reduce maintenance, and improve software quality Effective analysis tools to maximize code quality, reliability, and security, while minimizing cost and time Cost Factors – Square Project Analysis: • • Reworking defects 40%-50% of total project effort* Using Software Automated Quality tools in development cycle, increases ROI by 12%-21%* *CERT: U.S. Computer Emergency Readiness Team, and Carnegie Melon CyLab *NIST: National Institute of Standards & Technology : Square Project Results Software & Services Group Solution • Enterprise-class software analysis tools that can effectively find crucial code defects during the development phase – Pinpoints critical memory and threading errors – Enhances efficiencies, code reliability, quality, and security – Improves developer productivity during the development and quality assurance phases, – Reduced maintenance time, and increased ROI Software & Services Group Introducing Intel’s Cantua HE Checker Memory and Thread Analysis Solution Ensures Code Quality and Reliability Value At-A-Glance Enterprise-class • Built-in Dynamic tools for Serial and Parallel code solution Intel’s Cantua HE Checker Standalone Correctness Tool Dynamic • Detects Memory Leaks and Memory corruption Memory • Invalid Memory Accesses Analysis • Invalid Partial Memory Accesses • Mismatched Memory Allocation/Deallocation • Missing Allocations • Uninitialized Memory Accesses & Partial Memory Accesses Dynamic Thread • Detects Deadlocks and Data Races Analysis Productivity • Intuitive GUI and CLI interface for a wide range of Tools functions • Dynamic Instrumentation • Detailed error data analysis • Time line visualization Beta starts week of April 5, 2010 Software & Services Group Introducing Intel’s HE Studio Minimizes risk, Maximizes Value Ensures Code Security, Reliability, Quality, Lowers TCO Value At‐A‐Glance Enterprise-class Code analysis solution Intel’s HE Studio Dynamic Memory Analysis Correctness Tool Dynamic Thread Analysis Static Security Analysis Detects wide range (>250) of errors and software security vulnerabilities Productivity Tools • Built-in Static and Dynamic tools for Serial and Parallel code • Detects Memory Leaks and Memory corruption • Invalid Memory Accesses • Invalid Partial Memory Accesses • Mismatched Memory Allocation/Deallocation • Missing Allocations • Uninitialized Memory Accesses & Partial Memory Accesses • Detects Deadlocks and Data Races • Buffer overruns • Uninitialized variables and Bad pointers • Unsafe library usage • Arithmetic overflow • OpenMP* errors, unchecked input, heap corruption, error prone usage • Intuitive GUI and CLI interface for a wide range of functions • Dynamic Instrumentation • Detailed error data analysis and Time line visualization Software & Services Group Intel Correctness Tools Product Family Positioning Intel® Thread Checker Enterprise/HPC • Data race and deadlock detection • C/C++ & Fortran • Microsoft Visual Studio 2003, 2005, 2008 • Windows XP, Vista • Linux Intel® Parallel Inspector Intel’s Cantua HE Intel’s HE Checker Studio Mainstream Enterprise/HPC Enterprise/HPC • Memory Analysis leaks and corruption • Thread Analysis data races, and deadlocks • C/C++ Microsoft Visual Studio 2005, 2008, (next major release 2010) • Memory analysis leaks and corruption • Thread analysis data races, and deadlocks • C/C++, Fortran and .NET • Windows, Linux, Microsoft Visual Studio 2005, 2008, 2010 • Standalone package • Static Security analysis • Memory analysis leaks and corruption • Thread analysis data races, and deadlocks • C/C++, Fortran and .NET • Windows, Linux, Microsoft Visual Studio 2005, 2008, 2010 Software & Services Group Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, reference www.intel.com/software/products. Intel® and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others. Copyright © 2010. Intel Corporation. http://www.intel.com/software/products Software & Services Group 33 Software & Services Group
© Copyright 2026 Paperzz