Architecture and Performance of the TILE-Gx72 Manycore Processor Matthew Mattina Hot Interconnects 2013 CTO Tilera Agenda • Overview • iMesh Architecture • Performance Results © 2013 Tilera Corporation. 2 Hot Interconnects August 2013 TILE-Gx72™: At-a-Glance Tile Array: 72x 64-bit cores • – • 44K keys per second, zero core resources PCIe 2.0 4 Lanes DDR3 Controller DDR3 Controller Flexible I/O 96 Gbps of dedicated PCIe and SR-IOV support MiCA 4x DDR3 controllers @ 1600MT/sec – PCIe 2.0 4 Lanes PCIe 2.0 8 Lanes 6x PCIe ports – • Feeds packets into the mPIPE, 120MPPS MiCA™: Cryptographic and RSA Engines – • PCIe 2.0 8 Lanes Dynamic flow affinity 8x 10GbE XAUI ports DDR3 Controller UART x2, USB x2, JTAG, I2C, SPI mPIPE™: Wirespeed programmable packet processing and load balancing engine – • 75W TDP DDR3 Controller Network I/O 4x GbE SGMII 10 GbE XAUI 4x GbE SGMII 10 GbE XAUI 4x GbE SGMII 10 GbE XAUI 4x GbE SGMII 10 GbE XAUI 4x GbE SGMII 10 GbE XAUI 4x GbE SGMII 10 GbE XAUI 4x GbE SGMII 10 GbE XAUI 4x GbE SGMII 10 GbE XAUI 32-Lanes – MiCA SerDes SerDes SerDes SerDes SerDes SerDes SerDes SerDes 1.2GHz, TSMC 40nm HPM mPIPE • Tile = core + 256KB L2 cache + iMesh interface TRIO – 24-Lanes • > 50GB/s main memory BW • Standard SMP Linux, C/C++, Java, gdb,… • World’s Highest Single Chip CoreMark™ score © 2013 Tilera Corporation. 45 x 45mm BGA package 3 Hot Interconnects August 2013 Markets and Solutions Heterogeneous Devices • TILE-Gx72 data plane + x86 control plane – – – – Application Delivery Controllers Firewalls Server adaptor cards Linux Programmable Emerging SDN/NFV I/O Offload TILE-Gx Processor X86 Processor Homogeneous Devices • TILE-Gx72 data and control plane – Single-chip VPN/Router/Firewall – IDS/IPS Appliances – Video Transcoding Bridge DRAM TILE-Gx Processor DRAM © 2013 Tilera Corporation. 4 Hot Interconnects August 2013 Agenda • Overview • iMesh Architecture • Performance Results © 2013 Tilera Corporation. 5 Hot Interconnects August 2013 iMesh: Scaling to 72 cores and beyond iMesh: Multiple mesh networks and cache coherence protocol interconnecting all ondie components • Single shared global physical address space; direct cache access; full HW cache coherence • Scalable: more tiles = more interconnect bandwidth and more shared cache BW • Three protocol classes, three physical meshes, three different widths: keeps iMesh router simple and fast Tile Throughput • 4 × √𝑛 Number of Cores © 2013 Tilera Corporation. 6 Hot Interconnects August 2013 iMesh: Physical and Link Layer • Each iMesh interface is 5x5 xbar, connects 4 neighbors + core, point-to-point wires, low power Tile 0,0 core Tile 0,1 core • Fast arbiter: single cycle hop, same clock as core • Source-based, wormhole routing, route headers coded for high-speed • Link level FIFOs and per-hop flow control • Handplaced M7/M8 routed over L2 cache minimizes wiring channel area • Regular design accelerates time to market: build/verify one Tile, then replicate © 2013 Tilera Corporation. 7 Network Name Link Width, Bisection BW SDN 128b, 346 GB/s RDN 112b, 302 GB/s QDN 64b, 173 GB/s Hot Interconnects August 2013 iMesh: Caching and Coherence • Every Tile contains private 32KB L1 I and D caches and 256KB L2 cache PCIe 2.0 8 Lanes Flexible I/O MiCA mPIPE PCIe 2.0 4 Lanes PCIe 2.0 4 Lanes TRIO 24-Lanes PCIe 2.0 8 Lanes 32-Lanes UART 4x GbE SGMII 10 GbE XAUI 4x GbE SGMII 10 GbE XAUI 4x GbE SGMII 10 GbE XAUI 4x GbE SGMII 10 GbE XAUI 4x GbE SGMII 10 GbE XAUI 4x GbE SGMII 10 GbE XAUI 4x GbE SGMII 10 GbE XAUI 4x GbE SGMII 10 GbE XAUI SerDes SerDes SerDes SerDes SerDes SerDes SerDes SerDes • L2 caches private L2 lines and global “L3” lines • Distributed coherence directory tracks sharers, invalidates shared copies on writes Network I/O MiCA • Flexible cache line distribution © 2013 Tilera Corporation. 8 Hot Interconnects August 2013 iMesh Protocol - Remote Write Hit with Sharer Tile 0 Directory Tile 1 TAG DATA TAG(P) 0x0 Directory Store V, 0x1 V (a virtual address) 1 TLB TAG DATA TAG(P) 0x0 Tile 2 Directory TAG DATA 0, 1 TAG(P) 0x0 P (a physical address), Home Tile coordinates DDR3 © 2013 Tilera Corporation. 9 Hot Interconnects August 2013 iMesh Protocol - Remote Write Hit with Sharer Tile 0 Directory TAG TAG(P) Tile 1 6 DATA 2 0x42 0x1 Data invalidated! Directory Local copy is updated TAG DATA TAG(P) 0x0 Cache controller writesthrough to home on SDN Inval sent to sharer on QDN 3 Store V, 0x1 Directory V (a virtual address) 1 TLB 5 Tile 2 P (a physical address), Home Tile coordinates 4 TAG DATA 0, 1 TAG(P) Write is processed by home cache and sharer list is inspected 0x0 DDR3 © 2013 Tilera Corporation. 10 Hot Interconnects August 2013 iMesh Protocol - Remote Write Hit with Sharer Tile 0 Directory TAG TAG(P) Tile 1 6 DATA 2 0x42 0x1 Data invalidated! Directory Local copy is updated Store V, 0x1 0x42 Inval sent to sharer on QDN 5 Tile 2 Directory 1 TAG(P) 7 3 TLB DATA Inval ack sent over RDN Cache controller writesthrough to home on SDN V (a virtual address) TAG P (a physical address), Home Tile coordinates 4 TAG DATA 0, 1 TAG(P) Write is processed by home cache and sharer list is inspected 0x0 DDR3 © 2013 Tilera Corporation. 11 Hot Interconnects August 2013 iMesh Protocol - Remote Write Hit with Sharer Tile 0 Directory TAG TAG(P) Tile 1 6 DATA 2 0x42 0x1 Data invalidated! Directory Local copy is updated Cache controller writesthrough to home on SDN Write ack sent back to Tile 0 via RDN 3 9 1 TAG(P) 0x42 7 Inval sent to sharer on QDN Tile 2 Directory TLB DATA Inval ack sent over RDN Store V, 0x1 V (a virtual address) TAG P (a physical address), Home Tile coordinates 4 TAG 5 DATA 0,01 TAG(P) 0x42 0x1 Write is Tile 1 is removed processed by from sharers and home cache and new data is sharer list is written to cache! inspected 8 DDR3 © 2013 Tilera Corporation. 12 Hot Interconnects August 2013 iMesh at the Movies NP Arb NP Arb RR Arb, XY Route © 2013 Tilera Corporation. NP Arb 13 Hot Interconnects August 2013 Agenda • Overview • iMesh Architecture • Performance Results and Summary © 2013 Tilera Corporation. 14 Hot Interconnects August 2013 Performance Results H.264 Video Encode2 700 90 80 70 60 50 40 30 20 10 0 600 500 Fps Gbps TCP Throughput1 30x performance improvement 400 300 65x performance improvement 200 100 0 0 5 10 15 20 25 30 0 20 60 80 # cores # cores 1 40 Application Performance TCP Throughput 80Gbps, 30 cores h.264 1080p encode 22 channels, 72 cores Suricata IDS, policy based rule set3 13Gbps, 72 cores Packet forwarding with netfilter 80Gbps, 40 cores 512 connections, 1500B packets, “echo” server, full-duplex performance 2 “Toys_and_calendar” video sequence, 3Mbps, baseline profile rules, typical traffic 3 677 © 2013 Tilera Corporation. 15 Hot Interconnects August 2013 Summary The iMesh interconnect and cache coherence protocol enables Tilera’s Gx family of Manycore processors to efficiently scale performance from the TILE-Gx9 to the TILE-Gx72 1847-Ball BGA, 45 x 45 pin compatible, 1265-Ball BGA, 37.5 x 37.5 Gx-72 Gx-36 Gx-16 Gx-9 General Sampling In production SOFTWARE COMPATIBLE © 2013 Tilera Corporation. 16 Hot Interconnects August 2013 Thank You!
© Copyright 2024 Paperzz