Everyone Loves File: File Storage Service (FSS) in Oracle Cloud Infrastructure.
Bradley C. Kuszmaul
Date: Monday, November 09, 2020
Time: 2:00 PM to 3:00 PM Note: all times are in the Eastern Time Zone
Location: https://mit.zoom.us/meeting/register/tJUrdOqopj8uHdO4gUyVMnfglOFEqIye_Je0 (Registration required, if you haven't registered for this series before)
Event Type: Seminar
Host: Julian Shun, MIT CSAIL
Contact: Julian Shun, email@example.com, firstname.lastname@example.org
Speaker URL: https://people.csail.mit.edu/bradley/
email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org
TALK: Everyone Loves File: File Storage Service (FSS) in Oracle Cloud Infrastructure.
Oracle File Storage Service FSS is an elastic filesystem
provided as a managed NFS service. A pipelined Paxos implementation
underpins a scalable block store that provides linearizable multipage
limited-size transactions. Above the block store, a scalable B-tree
holds filesystem metadata and provides linearizable multikey
limited-size transactions. Self-validating B-tree nodes and
housekeeping operations performed as separate transactions allow each
key in a B-tree transaction to require only one page in the underlying
block transaction. The filesystem provides snapshots by using
versioned key-value pairs. The system is programmed using a
nonblocking lock-free programming style. Presentation servers
maintain no persistent local state making them scalable and easy to
failover. A non-scalable Paxos-replicated hash table holds
configuration information required to bootstrap the system. An
additional B-tree provides conversational multi-key minitransactions
for control-plane information. The system throughput can be predicted
by comparing an estimate of the network bandwidth needed for
replication to the network bandwidth provided by the hardware.
Latency on an unloaded system is about 4~times higher than a Linux NFS
server backed by NVMe, reflecting the cost of replication. FSS
has been in production since January 2018, and holds tens of thousands
of customer file systems comprising many petabytes of data.
Note: I'm at Google now, and this talk describes work I did at Oracle.
Bradley C. Kuszmaul received his Ph.D. from MIT in 1994. His research
has focused on developing computer systems that behave well both in
practice and in theory. In 1987 he took a year off from graduate
school to act as one of the principal architects of the Connection
Machine CM-5 supercomputer. When he returned to MIT as a student in
1990, he co-developed the MIT Cilk multithreaded programming
environment and several prize-winning parallel chess programs. In
1995 he joined the faculty at Yale, where he developed the theory of
asymptotically optimal superscalar processors. In 1999 he joined
Akamai Technologies as a Senior Research Scientist, where he
contributed to the network communications infrastructure and then lead
the development of the network usage database. In 2002 he returned to
MIT as a research scientist, where he investigated scalable
transactional memory and practical cache-oblivious storage systems.
In 2016 he joined Oracle where he was one of the architects for the
Oracle File Storage Service, and in 2020 he joined Google. He is a
member of the IEEE and ACM.
IMPORTANT NOTE FOR ATTENDEES: If you have already registered for the Fast Code Seminars on Zoom since July 27, 2020, please use the Zoom link that you have received. This link will stay the same for subsequent Fast Code seminars this semester. Zoom does not recognize a second registration, and will not send out the link a second time. If you have any problems with registration, please contact email@example.com and firstname.lastname@example.org by 1:30pm on the day of the seminar, so that we can try to resolve it before the seminar begins.
Algorithms & Theory, Programming Languages & Software, Systems & Networking
Created by Julian J. Shun at Wednesday, November 04, 2020 at 12:12 PM.