Parsing numbers at a gigabyte per second

Speaker: Daniel Lemire

Date: Wednesday, May 12, 2021

Time: 3:00 PM to 4:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: https://mit.zoom.us/meeting/register/tJUrdOqopj8uHdO4gUyVMnfglOFEqIye_Je0 (Registration required, only if you haven't registered for this series before; please read IMPORTANT NOTE below)

Event Type: Seminar

Room Description:

Host: Julian Shun, MIT CSAIL

Contact: Linda Lynch, lindalynch@csail.mit.edu

Relevant URL:

Speaker URL: https://lemire.me/en/

Speaker Photo:
None

Reminders to: fast-code-seminar@lists.csail.mit.edu, seminars@csail.mit.edu, pl@csail.mit.edu, commit@lists.csail.mit.edu

Reminder Subject: TALK: Parsing numbers at a gigabyte per second (Please read the IMPORTANT NOTE regarding registration)

******************IMPORTANT NOTE ABOUT REGISTRATION******************
- If you have already registered for any previous Fast Code Seminar on Zoom, please use the Zoom link that you have received in the past. This link will stay the same for all subsequent Fast Code seminars on Zoom. Please save it to your calendar!
- Zoom does not recognize a second registration, and will not send out the link a second time. The organizers will not be notified of any second registration.
- If you have any problems with registration, please contact lindalynch@csail.mit.edu by 2:30pm on the day of the seminar, so that we can try to resolve it before the seminar begins.
*********************************************************************

Abstract: Back when disks could barely provide megabytes of bandwidth per second, slow data processing software was acceptable. It is now time to revisit our performance expectations. With disks and networks providing gigabytes per second, parsing decimal numbers from strings becomes a bottleneck. We consider the problem of parsing decimal numbers to the nearest binary floating-point value. We present a C++ implementation that is often 4 times faster than standard C library on modern 64-bit systems (Intel, AMD, ARM and POWER9). Our work is available as open source software used by major systems such as Apache Arrow and Yandex ClickHouse. The Go standard library has adopted a version of our approach. The Rust port of our library is much faster than the state-of-the-art lexical library.

Bio: Daniel is a computer science professor at the University of Quebec, and long-time blogger. He is @lemire on Twitter, and he blogs at https://lemire.me/ His 75 research papers have been cited about 4000 times. He is among the top 500 GitHub users in terms of follower count: https://github.com/lemire

Research Areas:
Programming Languages & Software, Systems & Networking

Impact Areas:
Big Data

See other events that are part of the Fast Code 2020 - 2021.

Created by Julian J. Shun Email at Sunday, May 02, 2021 at 1:32 PM.