Machine Organization (CS 570) Lecture 3: Instruction Set Principles

Machine Organization
(CS 570)
Lecture 3: Instruction Set Principles and Examples*
Jeremy R. Johnson
Wed. Oct. 11, 2000
*This lecture was derived from material in the text (Chap. 2,
Appendices C and D).
All figures from Computer Architecture: A Quantitative Approach, Second
Edition, by John Hennessy and David Patterson, are copyrighted material
(COPYRIGHT 1996 MORGAN KAUFMANN PUBLISHERS, INC. ALL
RIGHTS RESERVED).
Oct. 11, 2000
Machine Organization
1
Introduction
• Objective: To examine the interface between the hardware
and the programmer - Instruction Set Architecture. To present
some design alternatives and examples.
• The Instruction Set Architecture (ISA) is the portion of the
machine visible to the programmer and compiler writer
• Topics
–
–
–
–
Looking at assembly code
Taxonomy and design Alternatives
Instruction set measurements
DLX
Oct. 11, 2000
Machine Organization
2
Storage in the CPU
• Stack
• Accumulator
• Register
– register-memory
– register-register
Stack
Accumulator Register-Memory
Register-Register
Push A
Push B
Add
Pop C
Load A
Add B
Store C
Load R1, A
Load R2, B
Add R3, R1, R2
Store C, R3
Load R1, A
Add R1, B
Store C, R1
C +A+B
Oct. 11, 2000
Machine Organization
3
General Purpose Register (GPR)
Machine
• Why?
– Faster than memory access
– Simplify compiler’s task
• How many registers?
– Parameter passing, expression evaluation, variables
• How many operands and of what type (register vs. memory)
– (0,3)
+ Simple fixed-length instruction encoding, similar number of clocks
- Higher instruction count
– (1,2)
+ Data can be accessed without first loading, easy to encode and good density
- Not symmetric, variable number of clocks, may limit number of registers
– (3,3)
+ Most compact, doesn’t waste registers for temporaries
- Large variation in instruction size and number of clocks, memory bottleneck
Oct. 11, 2000
Machine Organization
4
Addressing Modes
•
•
•
•
•
•
•
•
•
•
Register:
Immediate:
Displacement:
Indirect:
Indexed:
Direct:
Memory indirect:
Auto-increment:
Auto-decrement:
Scaled:
Oct. 11, 2000
Add R4, R3
Add R4, #3
Add R4, 100(R1)
Add R4, (R1)
Add R3, (R1 + R2)
Add R1, (1001)
Add R1, @(R3)
Add R1, (R2)+
Add R1,-(R2)
Add R1, 100(R2)[R3]
Machine Organization
5
Summary of Use of Addressing Modes
Oct. 11, 2000
Machine Organization
6
Distribution of Displacement
Oct. 11, 2000
Machine Organization
7
Percentage Immediate Mode
Oct. 11, 2000
Machine Organization
8
Distribution Immediate Mode
Oct. 11, 2000
Machine Organization
9
Instruction Categories
•
•
•
•
•
•
•
•
Arithmetic and Logical
Data Transfer
Control
System
Floating point
Decimal
String
Graphics
Oct. 11, 2000
Machine Organization
10
Top Ten Instructions (Intel)
SPECint92
•
•
•
•
•
•
•
•
•
•
Load 22%
Conditional branch 20%
Compare 16%
Store 12%
Add 8%
And 6%
Sub 5%
Move reg, reg 4%
Call 1%
Return 1%
Oct. 11, 2000
Machine Organization
11
Control Transfer
•
•
•
•
Conditional Branches
Jumps
Procedure calls
Procedure returns
Oct. 11, 2000
Machine Organization
12
Implementing Transfer Control
• Condition Code
? Special bits are set by ALU operations
+ Sometimes set for free (typically not the case)
- extra state, constrain ordering of instructions
• Condition Register
? Test arbitrary register with result of comparison
+ Simple
- Uses up a register
• Compare and Branch
? Compare is part of branch (often limited to subset)
+ One instruction rather than two
- May be too much work for an instruction
Oct. 11, 2000
Machine Organization
13
PC Relative Addressing
• Displacement off of PC
– Typically branch nearby
Oct. 11, 2000
Machine Organization
14
Encoding of Instruction Set
• Variable
• Fixed
• Hybrid
Oct. 11, 2000
Machine Organization
15
Compiler Optimizations
• High Level
– Procedure Inlining
• Local
– Common subexpression elimination
– Constant propagation
– Stack height reduction
• Global
–
–
–
–
Global common subexpression elimination
Copy propagation
Code motion
Induction variable elimination
• Machine Dependent
– Strength reduction
– Pipeline scheduling
– Branch offset optimization
Oct. 11, 2000
Machine Organization
16
Effect of Compiler Optimization
Oct. 11, 2000
Machine Organization
17
DLX
• Registers
– 32 32-bit GPR’s (R0 - R31), R0 = 0
– 32 SP FP registers (can be viewed as 16 DP FP registers)
– FP status register
• Data types
– 8-bit byte, 16-bit half word, 32-bit word, IEEE SP and DP FP
• Memory
– byte addressable, big Endian, 32-bit addresses
– addresses must be aligned
• Addressing Modes
– immediate
– displacement
Oct. 11, 2000
Machine Organization
18
DLX Operations
• Data Transfer
–
–
–
–
–
–
–
LB, LBU, SB
LH, LHU, SH
LW, SW
LF, LD, SF, SD
MOVI2S, MOVS2I
MOVF, MOVD
MOVFP2I, MOVI2FP
• Arithmetic/Logical
–
–
–
–
–
–
–
–
ADD, ADDI, ADDU, ADDUI
SUB, SUBI, SUBU, SUBUI
MULT, MULTU, DIV, DIVU
AND, ANDI
OR, ORI, XOR, XORI
LHI
SLL, SRL, SRA, SLLI, SRLI, SRAI
S__, S__I : “__” = LT, GT, LE, GE, EQ, NE
Oct. 11, 2000
Machine Organization
19
DLX Operations (cont)
• Control
–
–
–
–
–
–
BEQZ, BNEZ : 16 bit offset from PC+4
BFPT, BFPF : 16 bit offset from PC+4
J, JR : 26-bit offset from PC+4(J)
JAL, JALR : R31 = PC+4
TRAP
RFE
• Floating point
–
–
–
–
–
–
ADDD, ADDF
SUBD, SUBF
MULTD, MULTF
DIVD, DIVF
CVTF2D, CVTF2I, CVTD2F, CVTD2I, CVTI2F, CVTI2D
__D, __F : “__” = LT, GT, LE, GE, EQ, NE, sets bit in FP status register
Oct. 11, 2000
Machine Organization
20
DLX Instruction Format
• I-type
• R-type
• J-type
Oct. 11, 2000
Machine Organization
21
Distribution of Instructions in DLX
Oct. 11, 2000
Machine Organization
22
Distribution of Instructions in DLX
Oct. 11, 2000
Machine Organization
23
Effectiveness of DLX
Oct. 11, 2000
Machine Organization
24