CMP-MSI - EECG Toronto

Core to Memory Interconnection
Implications for Forthcoming
On-Chip Multiprocessors
1
Carmelo Acosta Francisco J. Cazorla
1,2
1,2
Alex Ramírez
Mateo Valero
1
UPC-Barcelona
2
Barcelona Supercomputing Center
CMP-MSI
Feb. 11th 2007
2
Overview

Introduction

Simulation Methodology

Results

Conclusions
CMP-MSI
Feb. 11th 2007
2
Introduction


As Process Technology advances it is more
important what to do with transistors.
Current trend to replicate cores.




Intel: Pentium4, Core Duo, Core 2 Duo, Core 2 Quad
AMD: Opteron Dual-Core, Opteron Quad-Core
IBM: POWER4, POWER5
Sun Microsystems: Niagara T1, Niagara T2
CMP-MSI
Feb. 11th 2007
3
Introduction
Power4 (CMP)

Power5 (CMP+SMT)
Memory Subsystem (green) spreads over more
than half the chip area.
CMP-MSI
Feb. 11th 2007
4
Introduction

Each L1 is connected to each L2 bank with a busbased interconnection network.
CMP-MSI
Feb. 11th 2007
5
Goal


Is directly applicable prior research in the
SMT field in the new CMP+SMT scenario?
NO…we have to revisit well-known SMT
ideas.
 Instruction Fetch Policy
CMP-MSI
Feb. 11th 2007
6
ICOUNT
Fetch
ROB
CMP-MSI
Feb. 11th 2007
7
ICOUNT
Fetch
ROB
FETCH
Stalled
L2 miss


Processor’s resources balanced between running
threads.
All resources devoted to blue thread unused until
L2 miss resolution.
CMP-MSI
Feb. 11th 2007
8
FLUSH
Fetch
ROB
FLUSH
Triggered
L2 miss

All resources devoted to the pending instructions
of the blue thread are freed.
CMP-MSI
Feb. 11th 2007
9
FLUSH
Fetch
Thread
Stalled
ROB
L2 miss


Freed resources allow additional forward progress.
L2 miss late detection  L2 miss prediction.
CMP-MSI
Feb. 11th 2007
10
Single vs Multi Core
L2 b0
L2 b1
L2 b2
L2 b3
More pressure on both:
• Interconnection Network
• Shared L2 banks
I$ D$
Core
L2 b0
L2 b1
L2 b2
L2 b3
I$ D$
I$ D$
I$ D$
I$ D$
Core
Core
Core
Core
CMP-MSI
Feb. 11th 2007
11
Single vs Multi Core
L2 b0
L2 b1
L2 b2
I$ D$
L2 b3
More Unpredictable L2 Access Latency
- BAD for FLUSH
Core
L2 b0
L2 b1
L2 b2
L2 b3
I$ D$
I$ D$
I$ D$
I$ D$
Core
Core
Core
Core
CMP-MSI
Feb. 11th 2007
12
Overview

Introduction

Simulation Methodology

Results

Conclusions
CMP-MSI
Feb. 11th 2007
13
Simulation Methodology

Trace driven SMT simulator derived from SMTsim.

C2T2, C3T2, C4T2 multicore configurations.
(CXTY, where X= Num. Cores and Y= Num. Threads/Core)
L2 b0
L2 b1
L2 b2
L2 b3
I$ D$
I$ D$
I$ D$
I$ D$
Core
Core
Core
Core
Core Details
(* per thread)
CMP-MSI
Feb. 11th 2007
14
Simulation Methodology


Instruction Fetch Policies:

ICOUNT

FLUSH
Workload classified per type:



ILP  All threads have good memory behavior.
MEM  All threads have bad memory behavior.
MIX  Mixes both types of threads.
CMP-MSI
Feb. 11th 2007
15
Overview

Introduction

Simulation Methodology

Results

Conclusions
CMP-MSI
Feb. 11th 2007
16
Results : Single-Core (2 threads)

FLUSH yields 22% average speedup over ICOUNT,
in MIX workloads.

Mainly on MEM/MIX workloads
CMP-MSI
Feb. 11th 2007
17
Results : Multi-Core (2 threads/core)
+Cores  -Speedup

FLUSH drops to 9% average slowdown over
ICOUNT in a four-cored multicore.
CMP-MSI
Feb. 11th 2007
18
Results : L2 Hits Latency on Multi-Core
+Cores  +latency
+dispersion
L2 hit latency (cycles)
CMP-MSI
Feb. 11th 2007
19
Results : L2 miss prediction

In this four-cored example, the best choice is
predicting L2 miss after 90 cycles.
CMP-MSI
Feb. 11th 2007
20
Results : L2 miss prediction

But, in this other four-cored example the best
choice is not to predict L2 miss.
CMP-MSI
Feb. 11th 2007
21
Overview

Introduction

Simulation Methodology

Results

Conclusions
CMP-MSI
Feb. 11th 2007
22
Conclusions


Future high-degree CMPs open new challenging research
topics in CMP+SMT cooperation.
The CMP outer cache level and interconnection
characteristics may heavily affect SMT intra-core
performance.


For example, FLUSH relies on a predictable L2 hit latency,
heavily affected in a CMP+SMT scenario.
FLUSH drops from 22% average speedup to 9% average
slowdown when moving from single-core to quad-core
configuration.
CMP-MSI
Feb. 11th 2007
23
Thank you
Questions?
CMP-MSI
Feb. 11th 2007