Thesis Defense: Bridging High Efficiency and Low Latency in Datacenters
, MIT CSAIL
Date: Wednesday, December 14, 2016
Time: 1:00 PM to 2:00 PM Note: all times are in the Eastern Time Zone
Refreshments: 2:30 PM
Host: Professor Daniel Sanchez, MIT CSAIL
Contact: Cree Bruins, 617-253-2629, email@example.com
Speaker URL: None
TALK: Thesis Defense: Bridging High Efficiency and Low Latency in Datacenters
Datacenters today host a diverse set of applications, ranging from scientific computing and business analytics to massive online services such as search engines and social media. Despite recent advancements, however, they continue to be plagued by low resource and energy efficiency, with server utilization of 10-50% being typical. This low utilization wastes billions of dollars in infrastructure and, since servers are not energy-proportional, terawatt-hours of energy annually. This thesis proposes novel hardware and software techniques to improve datacenter efficiency while satisfying the disparate performance needs of applications.
Low server utilization in datacenters stems in large part from the stringent performance requirements of latency-critical applications, which form the backbone of user-facing, interactive services. These applications require strict bounds on tail latency, often a few milliseconds or less, and must be run at low utilization to guard against short-term load spikes. Ideally, other applications can be colocated with latency-critical ones to improve resource utilization, while techniques such as dynamic voltage and frequency scaling can be used to minimize power consumption. This is unfortunately not possible on current systems, which are designed to maximize long-term average throughput but do not provide short-term performance guarantees.
We propose two techniques to improve resource and power efficiency for this class of applications. First, Ubik uses dynamic cache partitioning to safely colocate latency-critical applications with throughput-oriented batch applications. Ubik uses an analytical model to predict performance transients as application cache partitions are resized, an effect we call performance inertia. Leveraging these transients allows Ubik to provide latency guarantees while simultaneously maximizing batch throughput. Second, Rubik uses a lightweight statistical model of queued work to do fine-grain dynamic voltage and frequency scaling for latency-critical applications. Rubik adapts frequencies in response to the short-term load changes inherent in the operation of latency-critical applications, reducing dynamic power consumption, and enables more aggressive resource sharing between latency-critical and batch applications.
Our third technique, Shepherd, targets throughput-oriented batch applications, also common in datacenters. These applications typically enjoy higher utilization since they are easier to colocate. However, interference in shared resources such as the last-level cache and DRAM bandwidth can hurt their throughput. Shepherd mitigates this performance degradation via coordinated resource partitioning and cluster scheduling. Using node-local cache-partitioning decisions to guide cluster scheduling allows Shepherd to colocate applications with complementary resource requirements, significantly boosting the effectiveness of cache partitioning.
These techniques span the system stack from microarchitecture to software runtimes and cluster schedulers. A common theme across these techniques is an emphasis on accurate analytical modeling to guide resource allocation decisions, eschewing heuristics often used in prior work. We find that this approach enables aggressive resource management while satisfying applications' disparate performance goals. Together, these techniques constitute a robust foundation for designing hardware and software systems for future datacenters.
Created by Cree Bruins at Thursday, December 08, 2016 at 5:32 PM.