Chapter 4: Processor Architecture
How does the hardware execute the instructions?
We’ll see by studying an example system
Based on simple instruction set devised for this purpose
Y86, inspired by x86
Fewer data types, instructions, addressing modes
Simpler encodings
Reasonably complete for integer programs
We’ll design hardware to implement Y86 ISA
Basic building blocks
Sequential implementation
Pipelined implementation
Instruction Set Architecture
Defines interface between hardware
and software
Software spec is assembly language
State: registers, memory
Instructions, encodings
Hardware must execute instructions
correctly
May use variety of transparent tricks
to make execution fast.
Results must match sequential
execution.
ISA is a layer of abstraction
Above: how to program machine
Below: what needs to be built
Application
Program
Compiler
OS
ISA
CPU
Design
Circuit
Design
Chip
Layout
Where Are We Now?
CS142 & 124
IT344
Y86 Processor and System State
RF: Program
registers
%eax
%ecx
%edx
%esi
%edi
%esp
%ebx
%ebp
CC:
Condition
codes
Stat: Program Status
ZF SF OF
DMEM: Memory
PC
Program Registers
Same 8 as with IA32. Each 32 bits
Condition Codes
Single-bit flags as in x86: OF (Overflow), ZF (Zero), SF (Negative)
Program Counter
Indicates address of instruction
Memory
Byte-addressable storage, words in little-endian byte order
Stat
Indicates exceptional outcomes (bad opcode, bad address, halt)
Y86 Instructions
Format
1 to 6 bytes of information read from memory
Can determine instruction length from first byte
Not as many instruction types, and simpler encoding than IA32
Each accesses and modifies some portion of the CPU and system
state
Program registers
Condition codes
Program counter
Memory contents
Encoding Registers
Each register has 4-bit ID
%eax
%ecx
%edx
%ebx
0
1
2
3
%esi
%edi
%esp
%ebp
6
7
4
5
Similar encoding used in IA32
But we never deciphered encoding to notice!
Register ID 0xF indicates “no register”
Will use this in our hardware design in multiple places
Could otherwise encode register # in 3 bits
Simplifies decoding of instructions
Instruction Example
Addition instruction
Generic Form
Encoded Representation
addl rA, rB
6 0 rA rB
Add value in register rA to that in register rB
Store result in register rB
Y86 allows addition to be applied to register data only
Set condition codes based on result
Two-byte encoding
First byte indicates instruction type
Second gives source and destination registers
e.g., addl %eax,%esi
has encoding 60 06
Arithmetic and Logical Operations
Instruction Code
Add
addl rA, rB
Function Code
6 0 rA rB
Subtract (rA from rB)
subl rA, rB
6 1 rA rB
And
andl rA, rB
6 2 rA rB
Exclusive-Or
xorl rA, rB
6 3 rA rB
Refer to generically as “OPl”
Encodings differ only by
“function code”
Low-order 4 bits in first
instruction word
All set condition codes as side
effect
Move Operations
Register --> Register
rrmovl rA, rB
2 0 rA rB
irmovl V, rB
3 0 F rB
V
rmmovl rA, D(rB) 4 0 rA rB
D
5 0 rA rB
D
mrmovl D(rB), rA
Immediate --> Register
Register --> Memory
Memory --> Register
Similar to the IA32 movl instruction
Simpler format for memory addresses
Separated into different instructions to simplify hardware
implementation
Move Instruction Examples
IA32
Y86
Encoding
movl $0xabcd, %edx
irmovl $0xabcd, %edx
30 82 cd ab 00 00
movl %esp, %ebx
rrmovl %esp, %ebx
20 43
movl -12(%ebp),%ecx
mrmovl -12(%ebp),%ecx
50 15 f4 ff ff ff
movl %esi,0x41c(%esp)
rmmovl %esi,0x41c(%esp)
40 64 1c 04 00 00
movl $0xabcd, (%eax)
—
movl %eax, 12(%eax,%edx)
—
movl (%ebp,%eax,4),%ecx
—
Jump Instructions
Jump Unconditionally
jmp Dest
7 0
Dest
Jump When Less or Equal
jle Dest
7 1
Dest
Jump When Less
jl Dest
7 2
Dest
Jump When Equal
je Dest
7 3
Dest
Jump When Not Equal
jne Dest
7 4
Dest
Jump When Greater or Equal
jge Dest
7 5
Dest
Jump When Greater
jg Dest
7 6
Dest
Refer to generically as “jXX”
Encodings differ only by
“function code”
Based on values of condition
codes
Same as IA32 counterparts
Encode full destination address
Unlike PC-relative
addressing in IA32
Stack Operations
pushl rA
a 0 rA 8
Decrement %esp by 4
Store word from rA to memory at %esp
Like IA32
popl rA
b 0 rA 8
Read word from memory at %esp
Save in rA
Increment %esp by 4
Like IA32
Same stack conventions as IA32
Subroutine Call and Return
call Dest
8 0
Dest
Push address of next instruction onto stack
Start executing instructions at Dest
Like IA32
ret
9 0
Pop value from stack
Use as address for next instruction
Like IA32
Miscellaneous Instructions
nop
0 0
Don’t do anything
halt
1 0
Stop executing instructions
IA32 has comparable instruction, but it can’t be executed in user
mode
We will use this instruction to stop the simulator
Y86 Instruction Set
Byte
0
nop
0 0
halt
1 0
rrmovl rA, rB
2 0 rA rB
irmovl V, rB
3 0 F rB
V
rmmovl rA, D(rB)
4 0 rA rB
D
mrmovl D(rB), rA
OPl rA, rB
jXX Dest
call Dest
ret
pushl rA
popl rA
1
2
3
5 0 rA rB
4
5
addl
6 0
subl
6 1
andl
6 2
xorl
6 3
jmp
7 0
jle
7 1
jl
7 2
je
7 3
jne
7 4
jge
7 5
jg
7 6
D
6 fn rA rB
7 fn
8 0
Dest
Dest
9 0
A 0 rA F
B 0 rA F
Writing Y86 Code
Best to use C compiler as much as possible
Write code in C
Compile for IA32 with gcc -S
Hand translate into Y86
Coding example
Find number of elements in null-terminated list
int len1(int a[]);
a
5043
6125
7395
0
3
Y86 Code Generation Example
First try
Write typical array code
/* Find number of elements in
null-terminated list */
int len1(int a[])
{
int len;
for (len = 0; a[len]; len++)
;
return len;
}
Compile with gcc -O2 -S
Problem
Hard to do array indexing on
Y86: no scaled addressing
modes
L18:
incl %eax
cmpl $0,(%edx,%eax,4)
jne L18
x86 code
Y86 Code Generation Example #2
Second try
Revise to use pointers
/* Find number of elements in
null-terminated list */
int len2(int a[])
{
int len = 0;
while (*a++)
len++;
return len;
}
Compile with gcc -O2 -S
Result
Doesn’t use indexed addressing
L5:
movl (%edx),%eax
incl %ecx
addl $4,%edx
testl %eax,%eax
jne L5
x86 code
Y86 Code Generation Example #3
IA32 code
Setup
len2:
pushl %ebp
xorl %ecx,%ecx
movl %esp,%ebp
movl 8(%ebp),%edx
movl (%edx),%eax
je L7
Y86 code
Setup
len2:
pushl %ebp
xorl %ecx,%ecx
rrmovl %esp,%ebp
mrmovl 8(%ebp),%edx
mrmovl (%edx),%eax
je L7
Hand translation
#
#
#
#
#
#
Save %ebp
len = 0
Set frame
Get a
Get *a
Goto exit
Y86 Code Generation Example #4
IA32 code
Loop + Finish
L5:
movl (%edx),%eax
incl %ecx
addl $4,%edx
testl %eax,%eax
jne L5
movl %ebp,%esp
movl %ecx,%eax
popl %ebp
ret
Y86 code
Loop + Finish
L5:
mrmovl (%edx),%eax # Get *a
irmovl $1,%esi
addl %esi,%ecx
# len++
irmovl $4,%esi
addl %esi,%edx
# a++
andl %eax,%eax
# *a == 0?
jne L5
# No--Loop
rrmovl %ebp,%esp # Pop
rrmovl %ecx,%eax # Rtn len
popl %ebp
ret
Hand translation
Y86 Program Structure
irmovl Stack,%esp
rrmovl %esp,%ebp
irmovl List,%edx
pushl %edx
call len2
halt
.align 4
List:
.long 5043
.long 6125
.long 7395
.long 0
# Set up stack
# Set up frame
# Push argument
# Call Function
# Halt
Programmer must do
# List of elements
# Function
len2:
. . .
# Allocate space for stack
.pos 0x100
Stack:
more work; no
compiler, linker, runtime system
Make program
placement explicit
Stack initialization
must be explicit (addr.
0x100)
Must ensure code
is not overwritten!
Must initialize data
Can use symbolic
names
Assembling Y86 Program
unix> yas eg.ys
Generates “object code” file eg.yo
Actually looks like disassembler output
ASCII file to make it easy for you to read
0x000:
0x006:
0x008:
0x00e:
0x010:
0x015:
0x018:
0x018:
0x018:
0x01c:
0x020:
0x024:
308400010000
2045
308218000000
a028
8028000000
10
b3130000
ed170000
e31c0000
00000000
|
irmovl Stack,%esp
|
rrmovl %esp,%ebp
|
irmovl List,%edx
|
pushl %edx
|
call len2
|
halt
| .align 4
| List:
|
.long 5043
|
.long 6125
|
.long 7395
|
.long 0
# Set up stack
# Set up frame
# Push argument
# Call Function
# Halt
# List of elements
Simulating Y86 Program
unix> yis eg.yo
Instruction set simulator
Computes effect of each instruction on processor state
Prints changes in state from original
Stopped in 41 steps at PC = 0x16. Exception 'HLT', CC Z=1 S=0 O=0
Changes to registers:
%eax:
0x00000000
0x00000003
%ecx:
0x00000000
0x00000003
%edx:
0x00000000
0x00000028
%esp:
0x00000000
0x000000fc
%ebp:
0x00000000
0x00000100
%esi:
0x00000000
0x00000004
Changes to memory:
0x00f4:
0x00f8:
0x00fc:
0x00000000
0x00000000
0x00000000
0x00000100
0x00000015
0x00000018
CISC Instruction Sets
CISC: Complex Instruction Set Computer
Dominant style of machines designed prior to ~1980
Stack-oriented instruction set
Use stack to pass arguments, save program counter
Explicit push and pop instructions
Arithmetic instructions can access memory
addl %eax, 12(%ebx,%ecx,4)
Requires memory read and write + complex address calculation
Condition codes
Set as side effect of arithmetic and logical instructions
Philosophy
Add instructions to perform “typical” programming tasks
RISC Instruction Sets
Reduced Instruction Set Computer
Early projects at IBM, Stanford (Hennessy), and Berkeley (Patterson)
Fewer, simpler instructions in ISA
(initially)
Takes more to perform same operations (relative to CISC)
But an instruction can execute faster on simpler hardware
Register-oriented instruction set
Many more (typically 32) registers
Used for arguments, return value and address, temporaries
Only load and store instructions can access memory
Similar to Y86 mrmovl and rmmovl
No condition codes
Test instructions return 0/1 in general purpose register
Example: MIPS Registers
Example: MIPS Instructions
R-R
Op
Ra
addu $3,$2,$1
Rb
Rd
00000
Fn
# Register add: $3 = $2+$1
R-I
Op
Ra
addu $3,$2,3145
sll $3,$2,2
Branch
Op
Ra
beq $3,$2,dest
Load/Store
Op
Ra
Rb
Immediate
# Immediate add: $3 = $2+3145
# Shift left: $3 = $2 << 2
Rb
Offset
# Branch when $3 = $2
Rb
Offset
lw $3,16($2)
# Load Word: $3 = M[$2+16]
sw $3,16($2)
# Store Word: M[$2+16] = $3
CISC vs. RISC Debate
Strong opinions at the time!
CISC arguments
Easy for compiler (bridge semantic gap)
Concise object code (memory was expensive)
RISC arguments
Simple is better for optimizing compilers
A simple CPU can be made to run very fast
Current status
For desktop processors, choice of ISA not a technical issue
With enough hardware, anything can be made to run fast
Code compatibility more important
For embedded processors, RISC makes sense
Smaller, cheaper, less power
4.1 Summary
Y86 instruction set architecture
Similar state and instructions as IA32
Simpler encodings
Small instruction set
Y86 somewhere between CISC and RISC
Changes from x86 consistent with RISC principles
4.2: Logic Design: A Brief Review
Fundamental hardware requirements
Communication
How to get values from one place to another
Computation
Storage
All are simplified by restricting to 0s and 1s
Communication
Low or high voltage on wire
Computation
Compute Boolean functions
Storage
Store bits of information
Communication: Digital Signals
0
1
0
Voltage
Time
Use voltage thresholds to extract discrete values from continuous
signal
Simplest version: 1-bit signal
Either high range (1) or low range (0)
With guard range between them
Not strongly affected by noise or low quality circuit elements
Can make circuits simple, small, and fast
Computation: Logic Gates
Outputs are Boolean functions of inputs
Respond continuously to changes in inputs
After some small delay
Rising Delay
Falling Delay
a && b
b
Voltage
a
Time
Combinational Circuits
Acyclic Network
Primary
Inputs
Primary
Outputs
Acyclic network of logic gates
Continuously responds to changes on primary inputs
Primary outputs become (after some delay) Boolean functions of
primary inputs
Bit Equality
Bit equal
a
eq
HCL Expression
bool eq = (a&&b)||(!a&&!b)
b
Generate 1 if a and b are equal
Hardware control language (HCL)
Very simple hardware description language
Boolean operations have syntax similar to C logical operations
We’ll use it to describe control logic for processors
Much more convenient than drawing gates
Assumes compiler exists to turn HCL into gate equivalent
Word Equality
Word-Level Representation
b31
Bit equal
eq31
B
=
a31
b30
Bit equal
A
eq30
a30
HCL Representation
Eq
b1
Bit equal
a0
Bit equal
bool Eq = (A == B)
eq1
a1
b0
Eq
eq0
32-bit word size
HCL representation
Equality operation
Generates Boolean value
Bit-Level Multiplexer
s
Bit MUX
HCL Expression
bool out = (s&&a)||(!s&&b)
b
out
a
Control signal s
Data signals a and b
Output a when s=1, b when s=0
Word Multiplexer
Word-Level Representation
s
s
B
b31
out31
a31
Out
A
HCL Representation
b30
out30
a30
MUX
int Out = [
s : A;
1 : B;
];
Select input word A or B depending
b0
a0
on control signal s
HCL representation
Case expression
Series of test : value pairs
out0
Result value determined by first
successful test
Arithmetic Logic Unit
0
Y
X
1
Y
A
A
L
U
X+Y
B
OF
ZF
CF
X
2
Y
A
A
L
U
B
X-Y
OF
ZF
CF
X
3
Y
A
A
L
U
B
X&Y
OF
ZF
CF
X
Combinational logic
Continuously responding to inputs
Control signal selects function computed
Corresponding to 4 arithmetic/logical operations in Y86
Also computes values for condition codes
A
A
L
U
B
X^Y
OF
ZF
CF
Edge-Triggered Latch (Flip Flop)
D
R
Data
Q+
Q–
C
T
Clock
S
Trigger
Only in latching mode for
C
T
D
Q+
Time
brief period
On rising clock edge
Value latched depends on
data as clock rises
Output remains stable at all
other times
Storage: Registers
Structure
i7
D
C
Q+
o7
i6
D
C
Q+
o6
i5
D
C
Q+
o5
i4
D
C
Q+
o4
i3
D
C
Q+
o3
i2
D
C
Q+
o2
i1
D
C
Q+
o1
i0
D
C
Q+
o0
I
O
Clock
Clock
Each stores word of data (one byte in above register)
Different from program registers (e.g., %eax)
Collection of edge-triggered latches
Loads input on rising edge of clock
Register Operation
State = x
Input = y
Output = x
x
State = y
Rising
clock
Output = y
y
Stores data bits
For most of time acts as barrier between input and output
As clock rises, loads input
State Machine Example
Comb. Logic
0
Accumulator circuit
Load or accumulate
A
L
U
0
Out
on each cycle
MUX
In
1
Load
Clock
Clock
Load
In
Out
x0
x1
x0
x0+x1
x2
x0+x1+x2
x3
x4
x3
x3+x4
x5
x3+x4+x5
Storage: Random-Access Memory
valA
srcA
A
valW
Register
file
Read ports
valB
srcB
B
Stores multiple words of memory
Clock
Address input specifies which word to read or write
Register file
Holds values of program registers
– %eax, %esp, etc.
Register identifier serves as address
– ID 0xF implies no read or write performed
Multiple Ports
W
Can read and/or write multiple words simultaneously
– Each has separate address and data input/output
dstW
Write port
Register File Timing
valA
srcA
x
2
A
Register
file
valB
srcB
x
2
address
B
2
Reading
Like combinational logic
Output data generated based on input
After some delay
Writing
Like register (a few slides ago)
Update only as clock rises
x
valW
Register
file
W
Clock
dstW
y
2
Rising
clock
2
y
valW
Register
file
W
Clock
dstW
4.2 Summary
Computation
Performed by combinational logic
Computes Boolean functions
Continuously reacts to input changes
Storage
Registers
Hold single words
Loaded as clock rises
Random-access memories
Hold multiple words
Multiple read and write ports possible
Read word anytime address input changes
Write word only on rising clock edge
© Copyright 2026 Paperzz