Mattina

Architecture and Performance of
the TILE-Gx72 Manycore Processor
Matthew Mattina
Hot Interconnects 2013
CTO
Tilera
Agenda
• Overview
• iMesh Architecture
• Performance Results
© 2013 Tilera Corporation.
2
Hot Interconnects August 2013
TILE-Gx72™: At-a-Glance
Tile Array: 72x 64-bit cores
•
–
•
44K keys per second, zero core resources
PCIe 2.0
4 Lanes
DDR3 Controller
DDR3 Controller
Flexible
I/O
96 Gbps of dedicated PCIe and SR-IOV support
MiCA
4x DDR3 controllers @ 1600MT/sec
–
PCIe 2.0
4 Lanes
PCIe 2.0
8 Lanes
6x PCIe ports
–
•
Feeds packets into the mPIPE, 120MPPS
MiCA™: Cryptographic and RSA Engines
–
•
PCIe 2.0
8 Lanes
Dynamic flow affinity
8x 10GbE XAUI ports
DDR3 Controller
UART x2,
USB x2,
JTAG,
I2C, SPI
mPIPE™: Wirespeed programmable packet
processing and load balancing engine
–
•
75W TDP
DDR3 Controller
Network I/O
4x GbE
SGMII
10 GbE
XAUI
4x GbE
SGMII
10 GbE
XAUI
4x GbE
SGMII
10 GbE
XAUI
4x GbE
SGMII
10 GbE
XAUI
4x GbE
SGMII
10 GbE
XAUI
4x GbE
SGMII
10 GbE
XAUI
4x GbE
SGMII
10 GbE
XAUI
4x GbE
SGMII
10 GbE
XAUI
32-Lanes
–
MiCA
SerDes SerDes SerDes SerDes SerDes SerDes SerDes SerDes
1.2GHz, TSMC 40nm HPM
mPIPE
•
Tile = core + 256KB L2 cache + iMesh interface
TRIO
–
24-Lanes
•
> 50GB/s main memory BW
•
Standard SMP Linux, C/C++, Java, gdb,…
•
World’s Highest Single Chip CoreMark™ score
© 2013 Tilera Corporation.
45 x 45mm BGA package
3
Hot Interconnects August 2013
Markets and Solutions
Heterogeneous Devices
• TILE-Gx72 data plane + x86 control plane
–
–
–
–
Application Delivery Controllers
Firewalls
Server adaptor cards
Linux
Programmable
Emerging SDN/NFV
I/O Offload
TILE-Gx
Processor
X86
Processor
Homogeneous Devices
• TILE-Gx72 data and control plane
– Single-chip VPN/Router/Firewall
– IDS/IPS Appliances
– Video Transcoding Bridge
DRAM
TILE-Gx
Processor
DRAM
© 2013 Tilera Corporation.
4
Hot Interconnects August 2013
Agenda
• Overview
• iMesh Architecture
• Performance Results
© 2013 Tilera Corporation.
5
Hot Interconnects August 2013
iMesh: Scaling to 72 cores and beyond
iMesh: Multiple mesh networks and cache
coherence protocol interconnecting all ondie components
•
Single shared global physical address space;
direct cache access; full HW cache coherence
•
Scalable: more tiles = more interconnect
bandwidth and more shared cache BW
•
Three protocol classes, three physical
meshes, three different widths: keeps iMesh
router simple and fast
Tile
Throughput
•
4 × √𝑛
Number of Cores
© 2013 Tilera Corporation.
6
Hot Interconnects August 2013
iMesh: Physical and Link Layer
• Each iMesh interface is 5x5 xbar, connects
4 neighbors + core, point-to-point wires,
low power
Tile 0,0
core
Tile 0,1
core
• Fast arbiter: single cycle hop, same clock as
core
• Source-based, wormhole routing, route
headers coded for high-speed
• Link level FIFOs and per-hop flow control
• Handplaced M7/M8 routed over L2 cache
minimizes wiring channel area
• Regular design accelerates time to market:
build/verify one Tile, then replicate
© 2013 Tilera Corporation.
7
Network Name
Link Width,
Bisection BW
SDN
128b, 346 GB/s
RDN
112b, 302 GB/s
QDN
64b, 173 GB/s
Hot Interconnects August 2013
iMesh: Caching and Coherence
• Every Tile contains private 32KB L1 I and
D caches and 256KB L2 cache
PCIe 2.0
8 Lanes
Flexible
I/O
MiCA
mPIPE
PCIe 2.0
4 Lanes
PCIe 2.0
4 Lanes
TRIO
24-Lanes
PCIe 2.0
8 Lanes
32-Lanes
UART
4x GbE
SGMII
10 GbE
XAUI
4x GbE
SGMII
10 GbE
XAUI
4x GbE
SGMII
10 GbE
XAUI
4x GbE
SGMII
10 GbE
XAUI
4x GbE
SGMII
10 GbE
XAUI
4x GbE
SGMII
10 GbE
XAUI
4x GbE
SGMII
10 GbE
XAUI
4x GbE
SGMII
10 GbE
XAUI
SerDes SerDes SerDes SerDes SerDes SerDes SerDes SerDes
• L2 caches private L2 lines and global
“L3” lines
• Distributed coherence directory tracks
sharers, invalidates shared copies on
writes
Network I/O
MiCA
• Flexible cache line distribution
© 2013 Tilera Corporation.
8
Hot Interconnects August 2013
iMesh Protocol - Remote Write Hit with Sharer
Tile 0
Directory
Tile 1
TAG
DATA
TAG(P)
0x0
Directory
Store V, 0x1
V (a virtual address)‫‏‬
1
TLB
TAG
DATA
TAG(P)
0x0
Tile 2
Directory
TAG
DATA
0, 1
TAG(P)
0x0
P (a physical address),
Home Tile coordinates
DDR3
© 2013 Tilera Corporation.
9
Hot Interconnects August 2013
iMesh Protocol - Remote Write Hit with Sharer
Tile 0
Directory
TAG
TAG(P)
Tile 1
6
DATA
2
0x42
0x1
Data
invalidated!
Directory
Local copy is
updated
TAG
DATA
TAG(P)
0x0
Cache controller writesthrough to home on SDN
Inval sent to
sharer on QDN
3
Store V, 0x1
Directory
V (a virtual address)‫‏‬
1
TLB
5
Tile 2
P (a physical address),
Home Tile coordinates
4
TAG
DATA
0, 1
TAG(P)
Write is
processed by
home cache and
sharer list is
inspected
0x0
DDR3
© 2013 Tilera Corporation.
10
Hot Interconnects August 2013
iMesh Protocol - Remote Write Hit with Sharer
Tile 0
Directory
TAG
TAG(P)
Tile 1
6
DATA
2
0x42
0x1
Data
invalidated!
Directory
Local copy is
updated
Store V, 0x1
0x42
Inval sent to
sharer on QDN
5
Tile 2
Directory
1
TAG(P)
7
3
TLB
DATA
Inval ack sent
over RDN
Cache controller writesthrough to home on SDN
V (a virtual address)‫‏‬
TAG
P (a physical address),
Home Tile coordinates
4
TAG
DATA
0, 1
TAG(P)
Write is
processed by
home cache and
sharer list is
inspected
0x0
DDR3
© 2013 Tilera Corporation.
11
Hot Interconnects August 2013
iMesh Protocol - Remote Write Hit with Sharer
Tile 0
Directory
TAG
TAG(P)
Tile 1
6
DATA
2
0x42
0x1
Data
invalidated!
Directory
Local copy is
updated
Cache controller writesthrough to home on SDN
Write ack sent back
to Tile 0 via RDN
3
9
1
TAG(P)
0x42
7
Inval sent to
sharer on QDN
Tile 2
Directory
TLB
DATA
Inval ack sent
over RDN
Store V, 0x1
V (a virtual address)‫‏‬
TAG
P (a physical address),
Home Tile coordinates
4
TAG
5
DATA
0,01
TAG(P)
0x42
0x1
Write is
Tile 1 is removed
processed by
from sharers and
home cache and
new data is
sharer list is
written to cache!
inspected
8
DDR3
© 2013 Tilera Corporation.
12
Hot Interconnects August 2013
iMesh at the Movies
NP Arb
NP Arb
RR Arb, XY Route
© 2013 Tilera Corporation.
NP Arb
13
Hot Interconnects August 2013
Agenda
• Overview
• iMesh Architecture
• Performance Results and Summary
© 2013 Tilera Corporation.
14
Hot Interconnects August 2013
Performance Results
H.264 Video Encode2
700
90
80
70
60
50
40
30
20
10
0
600
500
Fps
Gbps
TCP Throughput1
30x performance improvement
400
300
65x performance improvement
200
100
0
0
5
10
15
20
25
30
0
20
60
80
# cores
# cores
1
40
Application
Performance
TCP Throughput
80Gbps, 30 cores
h.264 1080p encode
22 channels, 72 cores
Suricata IDS, policy based rule set3
13Gbps, 72 cores
Packet forwarding with netfilter
80Gbps, 40 cores
512 connections, 1500B packets, “echo” server, full-duplex performance 2 “Toys_and_calendar” video sequence, 3Mbps, baseline profile
rules, typical traffic
3 677
© 2013 Tilera Corporation.
15
Hot Interconnects August 2013
Summary
The iMesh interconnect and cache coherence protocol enables
Tilera’s Gx family of Manycore processors to efficiently scale
performance from the TILE-Gx9 to the TILE-Gx72
1847-Ball BGA, 45 x 45
pin compatible, 1265-Ball BGA, 37.5 x 37.5
Gx-72
Gx-36
Gx-16
Gx-9
General
Sampling
In production
SOFTWARE COMPATIBLE
© 2013 Tilera Corporation.
16
Hot Interconnects August 2013
Thank You!