Introduction - Georgia Tech

ECE 8813a: Design & Analysis of
Multiprocessor Interconnection
Networks
Sudhakar Yalamanchili
School of Electrical and Computer Engineering
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated)
Course Content: Goals
• Coverage of basic concepts for high
performance multiprocessor and many core
interconnection networks
 Primarily link & data layer communication protocols
 Router architectures
• Understand established and emerging microarchitecture concepts and implementations
• Formal Analysis: deadlock & livelock
• Optimization
 Topology, power, latency, bandwidth, wiring, pin-out
ECE 8813a (2)
Course Outline
Case Studies
•
•
•
Operation through multiple switches:
Topologies, Routing, and Optimization
 Direct, indirect, regular, irregular
Formal models and analysis for
deadlock and livelock freedom
Operation through a single switch:
Router micro-architectures
 Buffering, arbitration, scheduling,
datapath
Operation of a single link:
switching and flow control
ECE 8813a (3)
Optimization: technology, congestion, reliability
•
Course Administration
• Instructor: Professor Sudhakar Yalamanchili
 Class webpage for contact information
www.ece.gatech.edu/users/sudha/academic/class/Network
s/Spring2012
 Class material drawn from
o
o
o
“Interconnection Networks: An Engineering Approach”,
J. Duato, S. Yalamanchili and L. Ni, Morgan Kaufmann
(pubs.), 2003
“Principles and Practices and Interconnection
Networks,” W. J. Dally and B. Towles, Morgan
Kaufmann (pubs).
Journal and Conference publications
 Publicly available simulation infrastructure
• Note: This is a 2-3-3 class!
ECE 8813a (4)
Course Administration (cont.)
• Midterm – 20% (February 22nd)
• Assignments – 40%
 Paper review
 Simulation exercise
• Research Project/Final Exam – 40%
 Paper in conference format
 Presentation TBD
• Last few weeks of the course will be coverage
of recent journal and conference papers
 Depending on timing/class size, one assignment will
be a paper presentation
ECE 8813a (5)
Project Deliverable Structure
• Project Proposal:
March 12th
• Revised Proposal:
March 28th
• Interim Report:
April 11th
• Project Final Delivery:
April 27th
• Project Examination:
May 2nd (3rd period)
• Formats for each deliverable will be provided in
advance
ECE 8813a (6)
Planned Assigment Schedule
• Anticipate 4-5 assignments
• Programming assignments every two weeks
starting January 16th
ECE 8813a (7)
Course Outline
1.
2.
3.
4.
5.
6.
7.
8.
9.
Flow Control
Switching Techniques
Topologies
Deadlock and Livelock Freedom
Router Architectures
Routing Algorithms
Network Optimization
Systems Impact of Networks
Case Studies
ECE 8813a (8)
Technology Trends…
bandwidth per router node (Gb/s)
10000
BlackWidow
1000
100
10
1
0.1
1985
1990
1995
2000
2005
2010
Torus Routing Chip
Intel iPSC/2
J-Machine
CM-5
Intel Paragon XP
Cray T3D
MIT Alewife
IBM Vulcan
Cray T3E
SGI Origin 2000
AlphaServer GS320
IBM SP Switch2
Quadrics QsNet
Cray X1
Velio 3003
IBM HPS
SGI Altix 3000
Cray XT3
YARC
year
Source: W. J. Dally, “Enabling Technology for On-Chip Networks,” NOCS-1, May 2007
ECE 8813a (9)
Where is the Demand?
Area, power
Throughput
Performance
cost
Cables,
connectors,
transceivers
latency, power
ECE 8813a (10)
Blurring the Boundary
• Use heterogeneous multi-core chips for
embedded devices
 IBM Cell  gaming
 Intel IXP  network processors
 AMD Fusion
• Use large numbers of multicore processors to
build supercomputers
 NVIDIA Fermi
o
Tsubame 2.0, Keeneland, Titan, Blue Waters
 IBM Blue Gene/P
• Interconnection networks are central all across
the spectrum!
ECE 8813a (11)
Intel Sandy Bridge
• Cache coherent
shared memory
• Ring
interconnect
From geeks3D.com
ECE 8813a (12)
NVIDIA Fermi GF 100
•4 Global Processing Clusters
(GPCs) containing 4 SMs each
•Each SM has 32 ALUs, 4
SFUs, and 16 LS units
•Each ALU has access to 1024
32bit registers (total of 128kB
per SM)
•Each SM has its own Shared
Memory/L1 cache (64kB total)
•Unified L2 cache (768kB)
•Six 64bit Memory Controllers
(total 384bit wide)
ALU
Streaming multiprocessor
(SM)
ECE 8813a (13)
Intel TeraOp Die
• 2D Mesh
• Really a test
chip
• Aggressive
speed – multiGHz links
From rj3sp.blogspot.com
ECE 8813a (14)
On-Chip Networks
• Why are they
different?
 Abundant bandwidth
 Power
 Wire length
distribution
• Different functions
 Operand networks
 Cache memory
 Message passing
ECE 8813a (15)
Topologies
Binary Hypercube
Tori
0000
0001
1110
1111
Multistage Interconnection
Fat Tree
ECE 8813a (16)
Router Microarchitecture: Example
High Radix
Router
Architecture
(Cray Inc.)
S. Scott, D. Abts, J. Kim, W. J. Dally, “The BlackWidow High-Radix Clos Network,” Proceedings of
ICS 2006
ECE 8813a (17)
Cray XT3
• 3D Torus interconnect
• HyperTransport +
Proprietary link/switch
technology
• 45.6GB/s switching
capacity per switch
From craysupercomputers.com
ECE 8813a (18)
Blue Gene/L
From nersc.gov
From pisces.edu
ECE 8813a (19)
Blue Gene/L Networks
• 3D torus
• Dual PPC node
processor ASIC
• Multiple networks to
satisfy distinct
communication
requirements
From http://www.research.ibm.com/journal/rd/492/gara.html
ECE 8813a (20)
Some Industry Standards
• On-Chip
 Open Core Protocol
o
Really an interface standard
 AXI
 Opportunities for secret sauce/customization
• PCI Express
• AMD HyperTransport and Intel Quickpath
I/F
moved
on-chip
• Infiniband and 10G Ethernet
ECE 8813a (21)
Lets Get Started…….
ECE 8813a (22)