AMD`s Jaguar Microarchitecture

Jaguar
Microarchitecture
Alex Avery, Cody Smith
Agenda
●
●
●
●
●
●
●
●
●
●
AMD Processors
Jaguar Overview
Example Hardware
Core Pipeline
Instruction Fetch and Cache
Instruction Decoding
Scheduling
Integer & FP Execution
Memory
Cache
What is a Microarchitecture?
Microarchitecture is the Computer Organization
Microarchitecture + Instruction Set Architecture = Computer Architecture
A Microarchitecture describes the electrical circuitry of the device, it is how the
ISA is implemented.
AMD Processors
●
●
●
●
●
●
Bobcat (2011)
Piledriver (2012)
Jaguar (2013)
Steamroller (2014)
Puma (2014)
Excavator (2015)
Jaguar Overview
●
●
●
●
●
●
●
●
●
●
Targets 2-25W Devices
Low cost
28 nm Technology
Up to 4 Cores
Split L1 Cache - 32 KiB instruction and 32 KiB data per core
Unified L2 Cache - 1-2 MiB, 16 way
Out-of-order and Speculative Execution
Integrated memory controller
Two-way integer execution
Two-way 128-bit floating-point execution
Example Hardware
●
Gaming Consoles
○
○
●
Desktop Processors
○
○
●
A6-5200
E2-3000
Tablets
○
●
Athlon 5350
Sempron 3850
Laptops/Mini PCs
○
○
●
Xbox One
PS4
A6-1450
Embedded Processors
○
GX-420CA
Jaguar Core Pipeline
Instruction Fetch and Cache
●
●
●
●
●
6 Stages
32KB 2 way set associative L1 cache
Pseudo least recently used (LRU)
replacement algorithm
32B Instruction fetch window
Branch predictors exploit
characteristics of both direct and
indirect branches as well as branch
density
Instruction Decoding
●
●
●
Can decode two x86 Instructions per cycle
Variable length x86 instructions are decoded
into complex micro-operations (COPs)
Can handle 128-bit vector units as well as
x86 Advanced Vector Extensions (AVX)
Scheduling
●
●
●
Out-of-order execution
After instructions are decoded into
COPs, they are dispatched
Each COP allocates a Retire
Control Unit (RCU) entry
Integer Execution
●
●
●
Separate Integer and Floating Point
Units
2 Symmetrical integer pipelines
Integer addition/subtraction takes 3
cycles
○
○
○
●
●
Read operands
Execute
Write back
6 Cycle multiplication
Separate hardware divider
Floating Point Execution
●
●
●
●
Designed for 128-bit wide execution
Targets SSE and AVX vector
extensions
2 Asymmetrical FP pipelines
4-7 cycles per addition/subtraction
○
○
○
●
Read operands (2 cycles)
Execute (1-4 cycles)
Write back (1 cycle)
Co-processor architecture
○
Dedicated decode, rename, out-of-order
scheduler and retire queue
Memory
●
●
●
Separate load and store pipelines
Aggressive re-ordering
○
Loads can occur out-of-order
○
Loads can be moved ahead of stores before
the target address is resolved
Memory Ordering Queue and Store
Queue handle memory ordering
L1 Data Cache
●
●
●
●
●
●
32KB
8-way associative
Parity protected writeback cache
Pseudo-LRU replacement algorithm
Can handle a 128-bit read and a 128-bit write each cycle
Average latency of 3 cycles for a L1 hit
L2 Cache
●
●
●
●
●
●
1 - 2 MB (depending on application)
16-way set associative
Unified, shared by 2 to 4 cores
ECC Memory (Error Correcting Code) for tag and data arrays
Forms an EDC/ECC cache structure
Minimum of 25 cycles per hit
Jaguar Benchmarks
●
●
●
Athlon 5350
Athlon 5150
Sempron 3850
Athlon 5350 vs. Intel Core i3 3220 vs. Celeron J1900
Athlon 5350 vs. Intel Core i7 5930K
The Athlon 5350 is much
lower performance, however:
●
●
●
●
Much better efficiency
Much lower cost
Better performance per
watt
Better performance per
dollar
Zen
●
●
●
●
●
Entirely new core design
New design family ‘Summit Ridge’
Simultaneous Multithreading
New Cache System
FinFET manufacturing process
Resources
http://www.anandtech.com/show/6976/amds-jaguar-architecture-the-cpu-powering-xbox-one-playstation-4-kabini-temash
http://www.realworldtech.com/jaguar/
http://www.tomshardware.com/reviews/microsoft-xbox-one-console-review,3681-3.html
https://nathanlamont91.wordpress.com/2015/03/22/my-report-on-the-amd-jaguar-quad-core-cpu/
https://www.deepdyve.com/lp/institute-of-electrical-and-electronics-engineers/the-floating-point-unit-of-the-jaguar-x86-core1TVYueOORA
http://www.xbitlabs.
com/news/cpu/display/20120904201534_AMD_Discloses_Peculiarities_of_Next_Generation_Jaguar_Micro_Architecture.
html