Lectures 1: Review of Technology Trends and Cost/Performance

CS 704
Advanced Computer
Architecture
Lecture 6
Instruction Set Principles
(ISA Performance Analysis, Fallacies and Pitfalls)
Prof. Dr. M. Ashraf Chughtai
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
1
Today’s Topics
Recap Lecture 5
DSP Media Operations
ISA Performance
Putting it all Together
Summary
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
2
Recap: Lecture 5
Instruction encoding
- Essential elements of computer instruction
-
-
word:
- Type of operands
- Places of source and destinations
- Place of next instruction
Instruction word length
- Variable Length
- Fixed length
- Hybrid – variable fixed
Categories of Hybrid length
4, 3, 2, 1 and 0 address format
MAC/VU-Advanced
Computer Architecture
Lecture 5 - Instruction Set Principles ..
Cont'd
3
Recap: Lecture 5
….. Cont’d
- Comparison of hybrid instruction word format
Minimum number of memory bytes are required in case
of 1 address (accumulator) format
Maximum for 4-address format
- MIPS Instruction word format
- RISC and MIPS a fixed length, 64-bit LOAD/STORE
Architecture
- It supports:
- 8-, 16-, 32- and 64-bit operand
- R-type, I-type and J-type
- Arithmetic and logic operation
- data transfer operations
- Control flow operations
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
4
Media and Signal Processing Operands
 Graphic applications deal with 2D and 3D images
 3D data type is called vertex
 Vertex structure has 4-components
-
x- coordinate
y- coordinate
z- coordinate
w-coordinate
 The three vertices specify a graphic primitive, such as a
triangle; and the fourth to help with color and hidden
surfaces
 Vertex values are usually 32-bit Floating point values
 DSP adds fixed point to the data types – binary point just
to the right of the sign-bit
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
5
3D Data Type
 A triangle is visible when it is depicted as
filled with pixels
 Pixels are typically 32-bits, usually
consisting of four 8-bit channels
-
R -red
G-green
B-blue
A: Transparency of pixel
when it is depicted
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
6
Media and Signal Processing Operations
 Data for multimedia operations is
usually much narrower than the 64-bit
data word of modern processors
 Thus, 64-bit may be partitioned in to
four 16-bit data values so that the 64bit ALU to perform four 16-bit
operations (say add operation) in a
single clock cycle
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
7
Media and Signal Processing Operations
 Here, extra hardware is added to
prevent the ‘CARRY’ between the four
16-bit partitions of 64-bit ALU
 These operations are called Single-
Instruction Multiple-Data (SIMD) or
vector operations
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
8
Multimedia Operations
 Most graphic multimedia applications
use 32-bit floating point operations
allowing a single instruction to launch
two 32-bit operations on operands
found side-by-side in double precision
register
 The table shown here summarizes
SIMD instructions found in recent
computers
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
9
Summary of SIMD instructions
in recent computers
Insert Table given in Fig. 2.17 from page 110
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
10
Multimedia Operations
 You may note that there is very little
common across the five architectures
 All are fixed-width operation ,
performing multiple narrow operations
on either 64-bit or 128-bit ALU
 The narrow operation are shown as
B-byte,
H-half word
W-word and
8B double word
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
11
Digital Signals Processing Issues
 Saturating Add/Subtract
Too Large Result and Overflow
 Result Rounding
Choose from IEEE 754 mode
algorithms
 Multiply Accumulate
Vector and Matrix dot product operations
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
12
DSP Operations
 Saturating Add/Sub
DSP cannot ignore results of
overflow otherwise it may miss an
event, therefore, it uses saturating
arithmetic.
- Here, if the result is too large to be
presented it is set to the largest
representable number, based on the
sign of the number
-
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
13
DSP Operations
 Result Rounding
IEEE 754 has several algorithms to round
the wider accumulator into narrower one,
DSPs select the appropriate mode to
round the result
 Multiply-Accumulate (MAC)
MAC operations are the key to dot
product operations of vector and matrix
multiply which need to accumulate a
series of product
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
14
ISA Performance
 Role of Compiler
-
The interaction of compiler and high-level
languages significantly effects how
program uses an ISA
-
Optimizations performed by the
compilers can be classified as follows:
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
15
Classification of Performance
optimization
-
-
-
High-level optimization: is often done on the
source with the output fed to the later
optimization passes.
Local Optimization: is done within a straightline code fragment (basic block)
Global Optimization: extends the optimization
across branches
Register Allocation: associate registers with
operands
Processor-dependent optimization: using the
specific architecture
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
16
Impact of Compiler Technology
-
Interaction of compiler and high-level language
affects how a program uses an ISA
-
Here, two important questions are:
1:
2:
-
How are variables allocated?
How many registers are needed to
allocate variables appropriately?
These questions are addressed by using three
areas in which high-level language allocates
data
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
17
Three areas of data allocation
1: Local Variable area – Stack
-
It is used to allocate local variable
it grows or shrinks on procedure call or
return
Objects on stack are primarily scalar –
single variable rather than arrays and are
addressed by stack-pointer
Register allocation is much more
effective for stack-allocated objects
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
18
Three areas of data allocation
… Cont’d:
2: Global Data Area
-
It is used to allocate statically declared objects
such as global variables and constants
These objects are mostly arrays and other
aggregate data structures
Register allocation is relatively less effective
for global variables
Global variables are aliased – there are
multiple way to address so make it illegal to
put on registers
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
19
Three areas of data allocation
… Cont’d:
3: Dynamic Object Allocation: Heap
- It is used to allocate the objects that
do not
adhere to stack
- The objects in heap are accessed
with pointer but are not scalars
- Most heap variable are aliased so
register
allocation is almost
impossible for heap
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
20
ISA Performance … Cont’d
 MIPS Floating-point Operations
-
-
The instructions manipulate the floatingpoint registers
They indicate whether the operation is to
be performed on single precision or
double precision
MOV.S copies a single precision register to
another of the same type
MOV.D copies a Double precision register to
another of the same type
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
21
MIPS Floating-point Operations … Cont’d
To get greater performance for graphic
routines, MIPS64 offers Paired-Single
Instructions
- These instructions perform two 32-bit
floating point operations on each half of
the 64-bit floating point register
Examples:
-
ADD.PS
SUB.PS
MUL.PS
DIV.PS
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
22
Putting it All Together
-
The earliest architectures were limited to
instruction sets by the hardware
technology of that time
-
In the 1960s, stack architecture became
popular, viewed as being good match of
high-level language
-
In the 1970s, the main concern of the
architectures was to reduce the software
cost, thus produced high-level
architectures such as VAX machine
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
23
Putting it All Together .. Cont’d
-
In the 1980s, return to simpler
architecture took place due to
sophisticated compiler technology
- In the 1990s, new architectures were
introduced; these include:
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
24
Putting it All Together .. Cont’d
1990s Architectures
1: Address size doubles – 32-bit to 64-bit
2: Optimization of conditional branches via
conditional execution e.g.; conditional move
3: Optimization of Cache performance via
pre-fetch that increased the role of memory
hierarchy in performance of computers
4: Multimedia support
5: Faster Floating point instructions
6: Long Instruction Word
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
25
Concluding the Instruction set Principles
Three pillars of Computer Architecture
Hardware, Software and Instruction Set
Instruction Set
Interface between hardware and software
Taxonomy of Instruction Set:
Stack, Accumulator and General Purpose Register
Types and Size of Operands:
Types: Integer, FP and Character
Size: Half word, word, double word
Classification of operations
Arithmetic, data transfer, control and support
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
26
Concluding the Instruction set Principles… Cont’d
Operand Addressing Modes
Immediate, register, direct (absolute) and
Indirect
Classification of Indirect Addressing
Register, indexed, relative (i.e. with
displacement) and
memory
Special Addressing Modes
Auto-increment, auto-decrement and scaled
Control Instruction Addressing modes
Branch, jump and procedure call/return
MAC/VU-Advanced
Computer Architecture
Lecture 5 - Instruction Set Principles ..
Cont'd
27
Concluding the Instruction set Principles… Cont’d
Instruction encoding
- Essential elements of computer instructions:
-
-
type of operands, places of source and
destinations and place of next instruction
Instruction word length
Variable, fixed length and hybrid
Hybrid length taxonomy
4, 3, 2, 1 and 0 address format
Comparison of hybrid instruction word format
Minimum number of memory bytes are required
in case of 1 address (accumulator) format and
maximum for 4-address format
MAC/VU-Advanced
Computer Architecture
Lecture 5 - Instruction Set Principles ..
Cont'd
28
Concluding the Instruction set Principles… Cont’d
MIPS Instruction word format
- RISC and MIPS a fixed length, 64-bit LOAD/STORE
Architecture
- It supports:
- 8-, 16-, 32- and 64-bit operand
- R-type, I-type and J-type
- Arithmetic and logic operation
- data transfer operations
- Control flow operations
MAC/VU-Advanced
Computer Architecture
Lecture 5 - Instruction Set Principles ..
Cont'd
29
Concluding the Instruction set Principles… Cont’d
 Multimedia and Digital Signal Processing Operands
-
Graphic applications deal with 2D and 3D images
DSP adds fixed point to the data types – binary point
just to the right of the sign-bit
 Multimedia and Digital Signal Processing operations
All are fixed-width operation , performing multiple
narrow operations on either 64-bit or 128-bit ALU
The narrow operation B-byte, H-half word, W-word and
8B double word
 Multimedia and Digital Signal Processing issues
Saturating Add/Subtract
Result Rounding
Multiply Accumulate
MAC/VU-Advanced
Computer Architecture
Lecture 5 - Instruction Set Principles ..
Cont'd
30
Concluding the Instruction set Principles… Cont’d
 ISA Performance
 Role of Compiler: The interaction of compiler and highlevel languages significantly effects how program
uses an ISA
-
MAC/VU-Advanced
Computer Architecture
Lecture 5 - Instruction Set Principles ..
Cont'd
31
Allah Hafiz
and
Asalm-u-Alacum
MAC/VU-Advanced
Computer Architecture
Lecture 5 - Instruction Set Principles ..
Cont'd
32
Practice Problems
Quantitative Principles [Lecture 2-3]
Instruction Set Principles [Lecture 4-5]
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
33
Practice Problems
Quantitative Principles [Lecture 2-3]
1: Computer hardware is designed using ISA having three types (Type A,
B and C) of instructions. The clock cycles per instruction (CPI) for each
type of instruction is as follows:
Type – A
2 CPI
Type – B
3 CPI
Type – C
4 CPI
A compiler writer has written two different code sequences with
different instruction count to execute an expression as given below.
Code Sequence
Instruction count for instruction type
A
B
C
1
2
1
4
2
3
2
1
a) What is the instruction count of each sequence?
b) Which of the sequence is faster?
c) What is the CPI (average) for each instruction?
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
34
Solution to Practice Problem 1
a)
The instruction count of Sequence 1 = 2+4+1 = 7
Sequence 2 = 1+1+4= 6
Result: Sequence 2 executes fewer instructions
b)
To find which sequence is faster, we have to find the CPU clock
cycles for each sequence
CPU Clock Cycles for sequence 1 = 2x2 + 3x4 + 4x1 = 20 cycles
CPU Clock Cycles for sequence 1 = 2x3 + 3x2 + 4x4 = 28 cycles
c)
Result: Sequence 1 is faster
To find the CPI [ CPU Cycles/Instruction Count) of each sequence
CPI for sequence 1 = 20/7 = 2.85
CPI for sequence 2 = 28/6 = 4.67
Result: Sequence 2 which has fewer instructions has higher CPI,
thus is slower
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
35
Practice Problems
Instruction Set Principles [Lecture 4-5]
MAC/VU-Advanced
Computer Architecture
Lecture 6- Instruction Set Principles (3)
36