Integration for Heterogeneous SoC Modeling Y. Sophia Shao, Sam Xi, Gu-Yeon Wei, David Brooks Harvard University More accelerators. Out-of-Core Accelerators Maltiel Consulting estimates [Die photo from Chipworks] [Accelerators annotated by Sophia Shao @ Harvard] 2 [Shao, et al., IEEE Micro] Accelerator-CPU Integration: Today’s Conventional SoCs • Easy to integrate lots of IP, simple accelerator design • Hard to program and share data Core L2 $ … L3 $ Core L2 $ Acc #1 Acc #n Scratchpad Scratchpad On-Chip System Bus DMA 3 Accelerator Integration Trend • Users design application-specific hardware accelerators. • System vendors provide Host Service Layer with virtual memory and cache coherence support – Intel QuickAssist QPI-Based FPGA Accelerator Platform (QAP) – IBM POWER8’s Coherent Accelerator Processor Interface (CAPI) Main CPU/SoC Core L2 $ … L3 $ FPGA or user-defined ASIC Core L2 $ Accelerator Acc Agent Host Service Layer 4 Aladdin: A pre-RTL, PowerPerformance Accelerator Simulator Shared Memory/Interconnect Models Unmodified C-Code Accelerator Design Parameters (e.g., # FU, mem. BW) Aladdin Power/Area Accelerator Specific Datapath Private L1/ Scratchpad Performance “Accelerator Simulator” Design Accelerator-Rich SoC Fabrics and Memory Systems 5 Aladdin: A pre-RTL, PowerPerformance Accelerator Simulator Shared Memory/Interconnect Models Unmodified C-Code Accelerator Design Parameters (e.g., # FU, mem. BW) Aladdin Power/Area Accelerator Specific Datapath Private L1/ Scratchpad Performance “Accelerator Simulator” Design Accelerator-Rich SoC Fabrics and Memory Systems 6 Aladdin: A pre-RTL, PowerPerformance Accelerator Simulator Shared Memory/Interconnect Models Unmodified C-Code Aladdin Accelerator Design Parameters (e.g., # FU, mem. BW) Power/Area Accelerator Specific Datapath Private L1/ Scratchpad Performance “Accelerator Simulator” Design Accelerator-Rich SoC Fabrics and Memory Systems “Design Assistant” Understand Algorithmic-HW Design Space before RTL Flexibility Programmability Design Cost 7 Aladdin Overview Optimization Phase C Code Acc Design Parameters Optimistic IR Initial DDDG Idealistic DDDG Dynamic Data Dependence Graph Resource Program (DDDG) Constrained DDDG Constrained DDDG Realization Phase 8 Performance Activity Power/Area Models Power/Area Aladdin Take-Away • Compared to HLS and hand-written RTL for SHOC benchmarks and custom accelerator designs Cycle Counts within 2% Power within 5% Area within 7% • Large design space exploration (DSE) in minutes instead of hours/days with unmodified C/C++ algorithm description • Limitations – Dynamic approach Aladdin depends on realistic workload inputs – Algorithm dependent Aladdin enables DSE/algorithm exploration 9 Aladdin enables pre-RTL simulation of accelerators with the rest of the SoC. gem5 Big Cores ... gem5 Small Cores … Shared Ruby/GARNET Resources GPGPUGPU Sim Memory DRAMSim2 Interface Sea of Fine-Grained Accelerators 10 gem5-Aladdin Integration Acc Datapath CPU Cache DMA Engine LLC DRAM 11 Scratch pad TLB Cache gem5-Aladdin Integration Acc Datapath CPU … Cache Scratch pad DMA Engine TLB Cache Acc Datapath Scratch pad Acc Shared Cache LLC DRAM 12 TLB Cache … Acc Cache Memory CPU Cache Memory 13 Heterogeneous SoC Modeling • Increasing number of accelerators are integrated into both mobile SoCs and servers. • gem5-Aladdin integration enables rapid design space exploration of future accelerator-centric platforms. • Download Aladdin at http://vlsiarch.eecs.harvard.edu/aladdin 14
© Copyright 2026 Paperzz