Today’s Agenda Role of Performance BITS Pilani, Pilani Campus Definition of Performance • For some program running on machine X: PerformanceX = 1 / Execution timeX • X is n times faster than Y means: PerformanceX / PerformanceY = n BITS Pilani, Pilani Campus Performance Equation I seconds cycles seconds program program cycle equivalently CPU execution time for a program = CPU clock cycles for a program Clock cycle time • So, to improve performance one can either: – reduce the number of cycles for a program, or – reduce the clock cycle time, or, equivalently, – increase the clock rate BITS Pilani, Pilani Campus Performance Equation II CPU execution time for a program = Instruction count average CPI Clock cycle time for a program • Derive the above equation from Performance Equation I seconds cycles seconds program program cycle Average CPI: •It provides one way to compare two different implementations of same ISA since instruction count for a program remain same •It defers with processor design •It also depends on the mix of instruction types executed in an application. BITS Pilani, Pilani Campus CPI Example • • • • Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much? CPU Time CPU Time A Instructio n Count CPI Cycle Time A A A is faster… I 2.0 250ps I 500ps B Instructio n Count CPI Cycle Time B B I 1.2 500ps I 600ps B I 600ps 1.2 CPU Time I 500ps A CPU Time …by this much BITS Pilani, Pilani Campus CPI for Instruction Class • CPU clock cycles can be computed by looking at different types of instructions and using their individual clock cycle counts. • Formula: n CPU clock cycles = (CPIi X Ci) i=1 • Where, Ci is count of instructions of class I and CPIi is number of cycles per instruction for that instruction class. 6 BITS Pilani, Pilani Campus CPI Example • Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 2 1 2 IC in sequence 2 4 1 1 Sequence 1: IC = 5 Clock Cycles = 2×1 + 1×2 + 2×3 = 10 Avg. CPI = 10/5 = 2.0 Sequence 2: IC = 6 Clock Cycles = 4×1 + 1×2 + 1×3 =9 Avg. CPI = 9/6 = 1.5 BITS Pilani, Pilani Campus Performance Summary Instructio ns Clock cycles Seconds CPU Time Program Instructio n Clock cycle • Performance depends on – – – – – Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI Clock cycle time and CPI are determined by Implementation of the processor. BITS Pilani, Pilani Campus Evaluating Performance of a machine • One way is to compare the execution time of the workload on two different computers. • Another better way is to evaluate a computer using set of benchmark programs specially chosen to measure performance. • The best type of programs to use for benchmarks are real applications. • For example, typical engineering or scientific applications • For software development engineers benchmarks may include programs such as compiler or word processing systems. 9 BITS Pilani, Pilani Campus Benchmarks • Benchmark suites – – – – Perfect Club: set of application codes Livermore Loops: 24 loop kernels Linpack: linear algebra package SPEC: mix of code from industry organization BITS Pilani, Pilani Campus SPEC (System Performance Evaluation Corporation) • Sponsored by industry but independent and self-managed – trusted by code developers and machine vendors • Clear guides for testing, see www.spec.org • Regular updates (benchmarks are dropped and new ones added periodically according to relevance) • Specialized benchmarks for particular classes of applications BITS Pilani, Pilani Campus SPEC History • First Round: SPEC CPU89 – 10 programs yielding a single number • Second Round: SPEC CPU92 – SPEC CINT92 (6 integer programs) and SPEC CFP92 (14 floating point programs) – compiler flags can be set differently for different programs • Third Round: SPEC CPU95 – new set of programs: SPEC CINT95 (8 integer programs) and SPEC CFP95 (10 floating point) – single flag setting for all programs • Fourth Round: SPEC CPU2000 – new set of programs: SPEC CINT2000 (12 integer programs) and SPEC CFP2000 (14 floating point) – single flag setting for all programs – programs in C, C++, Fortran 77, and Fortran 90 BITS Pilani, Pilani Campus CINT2000 (Integer component of SPEC CPU2000) Program Language What It Is 164.gzip 175.vpr 176.gcc 181.mcf 186.crafty 197.parser 252.eon 253.Perlbmk 254.gap 255.Vortex 256.bzip2 300.twolf C C C C C C C++ C C C C C Compression FPGA Circuit Placement and Routing C Programming Language Compiler Combinatorial Optimization Game Playing: Chess Word Processing Computer Visualization PERL Programming Language Group Theory, Interpreter Object-oriented Database Compression Place and Route Simulator BITS Pilani, Pilani Campus CFP2000 (Floating point component of SPEC CPU2000) Program Language 168.wupwise 171.swim 172.mgrid 173.applu 177.mesa 178.Galgel 179.art 183.equake 187.facerec 188.ammp 189.lucas 191.fma3d 200.sixtrack 301.apsi Fortran 77 Fortran 77 Fortran 77 Fortran 77 C Fortran 90 C C Fortran 90 C Fortran 90 Fortran 90 Fortran 77 Fortran 77 What It Is Physics / Quantum Chromodynamics Shallow Water Modeling Multi-grid Solver: 3D Potential Field Parabolic / Elliptic Differential Equations 3-D Graphics Library Computational Fluid Dynamics Image Recognition / Neural Networks Seismic Wave Propagation Simulation Image Processing: Face Recognition Computational Chemistry Number Theory / Primality Testing Finite-element Crash Simulation High Energy Physics Accelerator Design Meteorology: Pollutant Distribution BITS Pilani, Pilani Campus Amdahl's Law • Amdahl's law, also known as Amdahl's argument, is named after computer architect Gene Amdahl • This law is used to find the maximum expected improvement to an overall system when only part of the system is improved. 15 BITS Pilani, Pilani Campus Amdahl's Law Execution Time After Improvement = Execution Time Unaffected +( Execution Time Affected / Amount of Improvement ) Time before Improvement Time after Improvement 16 BITS Pilani, Pilani Campus Amdahl’s Law • • 1 Speed-up = Perfnew / Perfold =Exec_timeold / Exec_timenew = f (1 f ) P Performance improvement from using faster mode is limited by the fraction the faster mode can be applied. Told f (1 - f) Tnew (1 - f) f/P 17 BITS Pilani, Pilani Campus Example • "Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?" How about making it 5 times faster? • There is no amount by which multiple can be enhanced to achieve 5 times increase in performance. • Thus performance enhance possible with a given improvement is limited by the amount that the improved feature is used. • Principle: Make the common case fast 18 BITS Pilani, Pilani Campus Remember • Performance is specific to a particular program/s – Total execution time is a consistent summary of performance • For a given architecture performance increases come from: – increases in clock rate (without adverse CPI affects) – improvements in processor organization that lower CPI – compiler enhancements that lower CPI and/or instruction count 19 BITS Pilani, Pilani Campus Processor Design Introduction 20 BITS Pilani, Pilani Campus Implementing MIPS Fetch/Execute cycle 21 BITS Pilani, Pilani Campus 22 BITS Pilani, Pilani Campus
© Copyright 2026 Paperzz