Anuj Kalia

Thesis Title: Efficient Remote Procedure Calls for Datacenters
Degree Type: Ph.D. in Computer Science
Advisor(s): David Andersen
Graduated: December 2019

Abstract:

Datacenter network latencies are approaching their microsecond-scale speed-of-light limit, and network bandwidths continue to grow beyond 100 Gbps. These improvements bear rethinking the design of communication-intensive distributed systems for datacenters, whose performance has historically been limited by slow networks. With the slowing down of Moore's law, a popular approach is to redesign distributed systems to use network hardware devices and technologies that offload communication or data access from commodity CPUs, such as smart network cards (NICs), lossless networks, programmable NICs, and programmable switches.

In this dissertation, we show that we can continue to use end-to-end software-only communication mechanisms to build high-performance distributed systems, i.e., we bring the speed of fast networks to distributed systems without an expensive redesign with in-network hardware offloads. We show that the ubiquitous Remote Procedure Call (RPC) communication mechanism, when rearchitected specially for the capabilities of modern commodity datacenter hardware, is a fast, scalable, flexible, and simple communication choice for distributed systems. We make three contributions. First, we present a detailed analysis of datacenter communication hardware—ranging from the peripheral bus that connects CPUs to NICs, to the datacenter's switched network—that informs our choice of the communication mechanism. Second, we lay out the advantages of RPCs over network hardware offloads through the design and evaluation of two new systems, a key-value store called HERD, and a distributed transaction processing system called FaSST. Third, we combine the lessons learned from the first two steps with new insights about datacenter packet loss and congestion control to create a new RPC library called eRPC, and show how existing distributed system codebases perform well over eRPC. In many cases, these systems substantially outperform offloads because they use less communication, and their end-to-end design provides exibility and simplicity.

Thesis Committee:
David G. Andersen (Chair)
Justine Sherry
Michael Kaminsky
MIguel Castro (Microsoft Research, Cambridge)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Keywords:
Datacenter networks, distributed systems, Remote Procedure Calls, Remote Direct Memory Access, eRPC

CMU-CS-19-126.pdf (1.77 MB) ( 144 pages)
Copyright Notice