Bringing Time-sensitivity To Distributed Systems and Networks

Speaker: Balaji Prabhakar , Departments of Electrical Engineering and Computer Science, Stanford University

Date: Friday, October 06, 2023

Time: 1:00 PM to 2:00 PM Note: all times are in the Eastern Time Zone

Location: Seminar Room G882 (Hewlett Room)

Host: Mohammad Alizadeh, MIT CSAIL

Distributed Systems and Packet-Switched Networks were developed in the 1970s under a "clockless design" paradigm. This was mainly due to the difficulty of accurately synchronizing clocks over jittery packet-switched networks, and it caused a bifurcation whose effects are felt to this day: widely-used commodity networks (such as those in public clouds) offer a "best effort" service, with networks using specialized hardware being required for providing "high-performance" or "time-sensitivity".

Imagine clocks can be accurately synchronized at scale and at distance without the need for specialized hardware. What implications would this have for Distributed Systems and Networking?

We describe Huygens---a high-accuracy, software-based network clock synchronization system, and show how it can be used to transform jittery and unpredictable public cloud infrastructure into deterministic, time-sensitive systems. We discuss two main applications: (1) building fair financial exchanges in the public cloud, and (2) transforming standard TCP+Ethernet networks into "zero-loss, zero-delay" networks. The latter are useful for speeding up LLM training and inference on TCP+Ethernet. We conclude by mentioning a new time-sensitive consensus protocol which has significantly higher throughput than Raft/Paxos.

Bio: Balaji Prabhakar is VMware Founders Professor of Computer Science and a faculty member in the Departments of Electrical Engineering and Computer Science, and, by courtesy, in the Graduate School of Business at Stanford University. His research interests are in computer networks; notably, in Data Center Networks and Cloud Computing Platforms. He has also worked on Societal Networks, where he has developed "nudge engines" to influence commuter behavior.

Balaji has been a Terman Fellow at Stanford University, and a Fellow of the Alfred P. Sloan Foundation, IEEE and ACM. He has received the NSF CAREER Award, the Erlang Prize from the INFORMS Applied Probability Society, the Rollo Davidson Prize given to young Statisticians and Probabilists, and delivered the Lunteren Lectures of the Dutch Operations Research Society. He is the inaugural recipient of the IEEE Innovation in Societal Infrastructure Award which recognizes "significant technological achievements and contributions to the establishment, development and proliferation of innovative societal infrastructure systems." He has also received the IEEE Koji Kobayashi and the ACM Sigmetrics Awards for his work on Computer Networks. During 2005--07 he was Switch Architect at Nuova Systems (acquired by Cisco Systems in 2008) where he developed the fabric scheduling and line card algorithms of Cisco's Nexus 5000 family of data center Ethernet switches. In 2011 he co-founded Urban Engines (acquired by Google in 2016) and is currently a co-founder of

Systems & Networking

