AMD’s Direct Connect Architecture 4 Node Opteron MP Architecture Pat Conway DRAM DRAM Core 0 SRI MCT Principal Member of Technical Staff Core 0 MCT SRI 12.8 GB/s 128-bit Core 1 Core 1 ncHT HT HT HT-HB HT I/O XBAR HT I/O HT-HB XBAR I/O 4.0GB/s per direction HT HT HT HT-HB XBAR This poster illustrates the variety of Opteron system topologies built to date using HyperTransport and AMD's Direct Connect Architecture for glueless multiprocessing and summarizes the latency/bandwidth characteristics of these systems. I/O XBAR Building and using Opteron systems has provided useful, real world lessons which expose the challenges to be addressed when designing future system interconnect, memory hierarchy and I/O to scale up both the number of cores and sockets in future x86 CMP Architectures. MCT SRI Core 0 SRI Core 0 Core 1 HT-HB cHT I/O Glueless Multiprocessing with Opteron CMP processors @ 2GT/s Data Rate HT Core 1 “Fully Connected 4 Node and 8 Node Topologies are becoming Feasible” “NorthBridge” MCT Additional HyperTransport™ Ports • Enable Fully Connected 4 Node (four x16 HT) and 8 Node (eight x8 HT) • Reduced network diameter – Fewer hops to memory • Increased Coherent Bandwidth – more links – cHT packets visit fewer links – HyperTransport3 up to 5.2GT/s • Benefits – Low latency because of lower diameter – Evenly balanced utilization of HyperTransport links – Low queuing delays Four x16 HT OR Eight x8 HT DRAM DRAM Low latency under load Torrenza - HyperTransport-based Accelerators Imagine it, Build it Lessons Learned #1 Memory Latency is the Key to Application Performance! Fully Connected 4 Node Performance 4 Node Square SOA SOA XML XML Accelerator Accelerator I/O Accelerator Accelerator 4 Node fully connected I/O P0 16 I/O I/O P2 P0 16 P2 Performance vs Average Memory Latency (single 2.8GHz core, 400MHz DDR2 PC3200, 2GT/s HT with 1MB cache in MP system) 120% 6 100% 5 4 3 2 80% OLTP1 OLTP2 SW99 SSL JBB 60% 40% 20% 1 0 0% 1N 2N 4N (SQ) 8N (TL) P1 Native Dual-Core Native Dual-Core Processor Performance System Performance 7 P3 I/O P1 I/O P3 I/O I/O 8 GB/S SRQ SRQ Crossbar Crossbar Mem.Ctrlr HT Mem.Ctrlr 8 GB/S HT 8 GB/S + 2 EXTRA LINKS W/ HYPERTRANSPORT3 4N SQ (2GT/s HyperTransport) 4N FC (2GT/s HyperTransport) 4N FC (4.4GT/s HyperTransport3) Diam 2 Avg Diam 1.00 Diam 1 Avg Diam 0.75 Diam 1 Avg Diam 0.75 XFIRE BW 14.9GB/s XFIRE BW 29.9GB/s XFIRE BW 65.8GB/s (4X) (2X) PCI-E PCI-E Bridge Bridge Other Other Optimized Optimized Silicon Silicon FLOPs FLOPs Accelerator Accelerator XFIRE (“crossfire”) BW is the link-limited all-to-all communication bandwidth (data only) 8 GB/S 8N (L) USB USB I/O I/OHub Hub 4 Node Square 1 Node I/O P0 I/O AvgD Latency I/O P0 P2 P1 P3 I/O 0 hops x + 0ns PCI PCI 8N Ladder I/O P0 P2 P4 P6 I/O I/O P1 P3 P5 P7 I/O I/O 1 hops x + 44ns (124 cpuclk) 1.8 hops x + 105ns (234 cpuclk) 8N Twisted Ladder I/O P0 P2 P4 P6 I/O 2 Node I/O P0 P1 I/O 0.5 hops x + 17ns (47 cpuclk) I/O P1 P3 P5 1.5 hops x + 76ns (214 cpuclk) P7 I/O • Open platform for system builders – 3rd Party Accelerators – Media – FLOPs – XML – SOA • AMD Opteron™ Socket or HTX slot • HyperTransport interface is an open standard see www.hypertransport.org • Coherent HyperTransport interface available if the accelerator caches system memory (under license) • Homogeneous/heterogeneous computing • Architectural enhancements planned to support fine grain parallel communication and synchronization Fully Connected 8 Node Performance 8 Node Fully Connected 8 Node 2x4 8 Node6HT 6HT 2x4 8N Twisted Ladder P0 P2 16 P4 P6 I/O I/O I/O 5 4 3 0 P0 P1 P3 P5 P7 I/O 2 I/O I/O 3 0 P2 1 1 4 3 5 I/O I/O P2 P1 P3 P4 P3 4 3 P0 I/O P5 I/O P4 P1 P2 P3 I/O I/O 8 P7 I/O P6 I/O 3 4 2 0 I/O OR 2 1 0 P1 5 4 1 2 I/O I/O 8 P0 I/O 5 I/O I/O P6 P5 P7 8N TL (2GT/s HyperTransport) 8N 2x4 (4.4GT/s HyperTransport3) 8N FC (4.4GT/s HyperTransport3) Diam 3 Avg Diam 1.62 Diam 2 Avg Diam 1.12 Diam 1 Avg Diam 0.88 XFIRE BW 15.2GB/s XFIRE BW 72.2GB/s XFIRE BW 94.4GB/s (5X) (6X) 2006 Workshop on On- and Off-Chip Interconnection Networks for Multicore Systems, 6-7 December 2006, Stanford, California
© Copyright 2026 Paperzz