Thesis Defense: Enabling dedicated single-cycle connections over a shared Network-on-Chip
, MIT CSAIL
Date: Monday, October 07, 2013
Time: 10:00 AM to 12:00 PM Note: all times are in the Eastern Time Zone
Refreshments: 9:45 AM
Host: Professor Li-Shiuan Peh, MIT CSAIL + MTL
Contact: Maria Rebelo, 617-253-5895, email@example.com
Speaker URL: None
TALK: Thesis Defense: Enabling dedicated single-cycle connections over a shared Network-on-Chip
Adding multiple processing cores on the same chip has become the de facto design choice as we continue extracting more and more performance/watt from our chips in every technology generation. In this context, the interconnect fabric ("Network-on-Chip") connecting the cores starts gaining paramount importance. A high on-chip latency can create performance bottlenecks and limit scalability. Conventional wisdom thus says that communication is expensive and network traversals should be avoided.
This dissertation challenges this conventional wisdom. We show that on-chip networks can be designed to provide extremely low-latencies and handle bursts of high-bandwidth traffic, thus reversing the trade-offs one typically associates with local vs. remote cache access latencies, or broadcast vs. directory-based coherence protocols. The thesis progressively builds a network-on-chip fabric that dynamically creates single-cycle network paths across multiple-hops, for both unicast and collective (1-to-Many and Many-to-1) communication flows.
The goal of this thesis is to approach an "ideal" (dedicated point to point wires between all source-destination pairs) over a network-on-chip with shared links. We start with a prototype chip demonstrating single-cycle per-hop traversals over a mesh network-on-chip. This design is then enhanced to support 1-to-Many (multicast) traffic flows by intelligent forking at network routers within a cycle, and is validated by a prototype chip. The design is further enhanced to support Many-to-1 (acknowledgement) traffic by intelligent aggregation at network routers within a cycle. Finally, we leverage repeated wires on the data path (which can traverse 10+ mm within 1ns) and propose a dynamic cycle-by-cycle network reconfiguration methodology to provide single-cycle traversals across the network. The focus of this talk will be on this single-cycle multi-hop network, which we call SMART. SMART enables single-cycle traversals across 9-11 hops at a GHz, leading to a 5-8X latency reduction across traffic patterns compared to a single-cycle per-hop network on a 64-core chip. With a SMART network-on-chip, the full-system runtime of a suite of applications across broadcast and directory-based coherence protocols is found to be within 13% of that provided by the ideal contention-free all-to-all single-cycle network. Going forward, we believe that the ideas proposed in this thesis can pave the way for locality-oblivious shared-memory design.
Thesis Committee: Li-Shiuan Peh (advisor), Joel Emer, Srini Devadas
Created by Maria Rebelo at Tuesday, September 24, 2013 at 12:51 PM.