1. Computer Abstractions and Technology Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Classes of Computers Personal Computers • General purpose, variety of (third-party) software • Good performance at reasonable cost Server Computers • Accessed via a network • Large workloads, single (science or engineering) application or many small jobs (web server), customized applications (database, simulation) • Range from small (“mini”) servers to supercomputers (e.g. IBM BlueGene/Q) with terrabytes of main memory Embedded Computers • Hidden as components of systems, e.g. car, TV, router • Cost and power consumption critical for required performance Personal Mobile Device (PMD) • PostPC Era User interface, power consumption, network, cost critical: phones, tablets, glasses, bands Cloud Computing • Warehouse scale computers with 100,000 servers, geographically distributed • Running Software as a Service (SaaS), with portion on PMD and portion in the Cloud Five Main Classes of Computers (as of 2012) Question: What is the largest class (most manufactured)? A. Personal mobile device B. Desktop C. Server D. Clusters/warehouse-scale computer E. Embedded Manufactured Units (as of 2010) Personal mobile device: 1.8 billion PMDs (90% phones) Desktop: 350 million Server: 20 million Embedded: 19 billion (6.1 billion ARM based chips) What You Will (Not) Learn • How programs are translated into the machine language and how the hardware executes them • The hardware/software interface • What determines program performance and how it can be improved • How hardware designers improve performance • Techniques hardware designers use to improve energy efficiency and how programmers can support that • Why is there a shift from sequential to parallel (“multi-core”) processing and what consequences it has to programmers Eight “Great Ideas” in Computer Architecture • Design for Moore’s Law • Use abstraction to simplify design • Make the common case fast • Performance via parallelism • Performance via pipelining • Performance via prediction • Hierarchy of memories • Dependability via redundancy Levels of Program Code High-level language • Level of abstraction closer to problem domain • Provides for productivity and portability Assembly language • Textual notation of instructions • Directly represents hardware Machine language • Binary digits (bits) • Encoded instructions and data Instruction Set Architecture Instruction set architecture (ISA) • is the hardware/software interface • the specification hardware designers implement Application binary interface (ABI) • is the ISA plus system software interface • application programmers work with the ABI Question: What of the following is true for ISAs in general? A. Many models of processors can support one ISA. B. An ISA is unique to one model of processor. C. Every processor supports multiple ISAs. D. Each processor manufacturer has its own unique ISA. E. None of the above. Components of a Computer The five classic components are: • input • output • memory • datapath • control } processor Inside a Computer Capacitive multitouch LCD screen Computer board Inside a Processor The processor integrated circuit inside the A5 package: Defining Performance Which airplane has the best performance? Boeing 777 Boeing 777 Boeing 747 Boeing 747 BAC/Sud Concorde BAC/Sud Concorde Douglas DC-8-50 Douglas DC8-50 0 100 200 300 400 0 500 Boeing 777 Boeing 777 Boeing 747 Boeing 747 BAC/Sud Concorde BAC/Sud Concorde Douglas DC-8-50 Douglas DC8-50 500 1000 Cruising Speed (mph) 4000 6000 8000 10000 Cruising Range (miles) Passenger Capacity 0 2000 1500 0 100000 200000 300000 400000 Passengers x mph If we define performance as top cruising speed, Concorde is the fastest. If we want to transport 450 passengers, 747 has the highest throughput. Two Questions: Response Time vs Throughput Response time (execution time): total time required to complete a task Throughput (bandwidth): total work (number of task) done per unit time 1. Consider replacing a processor with a faster one. Does this: A. increase throughput, B. decrease response time, C. both? 2. Consider adding additional processors to a system that uses multiple processors for separate tasks (e.g. serving http requests). Does this: A. increase throughput, B. decrease response time, or C. both? Measuring Execution Time Elapsed time (wall clock time, response time): • Total response time, including disk access, I/O (e.g. network), OS overhead • Determines system performance CPU time • Time spent processing a given task • Discounts I/O time, other jobs’ shares • Comprises user CPU time and system CPU time Hence we refer to system performance and CPU performance CPU Clocking Operation of digital hardware governed by a constant-rate clock: Clock period Clock (cycles) Data transfer and computation Update state Clock period: duration of a clock cycle • e.g., 1000 ps = 1 ns = 0.001µs = 10-6 ms = 10-9 s Clock frequency (rate): cycles per second • e.g. 1 GHz = 1000 MHz = 106 kHz = 109Hz CPU performance for given program: How to improve CPU Time? CPU Time = CPU Clock Cycles × Clock Cycle Time CPU Clock Cycles = Clock Rate Instruction Count and Cycles Per Instruction (CPI) Instruction count for a program • Determined by program, ISA and compiler Average cycles per instruction • Determined by CPU hardware • If different instructions have different CPI • Average CPI affected by instruction mix Clock Cycles = Instruction Count × Cycles per Instruction CPU Time = Instruction Count × CPI × Clock Cycle Time Instruction Count × CPI = Clock Rate CPI Example • Computer A: Cycle Time = 250ps, CPI = 2.0 • Computer B: Cycle Time = 500ps, CPI = 1.2 Assuming same ISA (same programs), which is faster, and by how much? CPU Time A = Instruction Count × CPI × Cycle Time A A = I × 2.0 × 250ps = I × 500ps A is faster… CPU Time = Instruction Count × CPI × Cycle Time B B B = I × 1.2 × 500ps = I × 600ps CPU Time B = I × 600ps = 1.2 CPU Time I × 500ps A …by this much Factors influencing CPU Performance In general, different instructions can take a different number of cycles. In that case, the weighted average of the CPI’s has to be taken. Instructions Clock cycles Seconds CPU Time = × × Program Instruction Clock cycle CPU performance depends on • Algorithm: affects IC, possibly CPI • Programming language: affects IC, CPI • Compiler: affects IC, CPI • Instruction set architecture: affects IC, CPI, Clock cycle Power Trends In CMOS (Complimentary Metal Oxide Semiconductor) technology: Power = Capacitive load × Voltage 2 × Frequency ×30 5V → 1V ×1000 Power consumption at maximum (battery, cooling); voltage cannot be reliably decreased; capacitive load depends on “fanout” and technology: Power Wall Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency Multiprocessors Multicore microprocessors • More than one processor per chip • Requires explicitly parallel programming Compare with instruction level parallelism • Hardware executes multiple instructions at once • Hidden from the programmer Multicore programming is hard • Programming for performance • Load balancing • Optimizing communication and synchronization SPEC CPU Benchmark Programs used to measure performance • Supposedly typical of actual workload Standard Performance Evaluation Corp (SPEC) • Develops benchmarks for CPU, I/O, Web, … SPEC CPU2006 • Elapsed time to execute a selection of programs • Negligible I/O, so focuses on CPU performance • Normalize relative to reference machine • Summarize as geometric mean of performance ratios • CINT2006 (integer) and CFP2006 (floating-point) n n ∏ Execution time ratio i=1 i CINT2006 for Intel Core i7 920 SPEC Power Benchmark Power consumption of server at different workload levels • Performance: ssj_ops/sec • Power: Watts (Joules/sec) & 10 # & 10 # Overall ssj_ops per Watt = $ ∑ ssj_ops i ! $ ∑ poweri ! % i =0 " % i=0 " SPECpower_ssj2008 for Xeon X5650: Fallacy: Low Power at Idle Look back at i7 power benchmark • At 100% load: 258W • At 50% load: 170W (66%) • At 10% load: 121W (47%) Google data center • Mostly operates at 10% – 50% load • At 100% load less than 1% of the time Consider designing processors to make power proportional to load
© Copyright 2026 Paperzz