Genomic Analysis and Learning at Scale: Mapping Irregular Computations to Advanced Architectures

Speaker: Katherine Yelick , UC Berkeley and Lawrence Berkeley National Laboratory

Date: Monday, August 03, 2020

Time: 2:00 PM to 3:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: (Registration required)

Event Type: Seminar

Room Description: (Registration required)

Host: Julian Shun, MIT CSAIL

Contact: Julian Shun,,

Relevant URL:

Speaker URL:

Speaker Photo:

Reminders to:,,,

Reminder Subject: TALK: Genomic Analysis and Learning at Scale: Mapping Irregular Computations to Advanced Architectures

Abstract: Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become widely available. Enormous community databases store and share this data with the research community, and some of data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These computations range from analysis and correction of raw genomic data to higher level machine learning approaches. These applications differ from scientific simulations and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data. The ExaBiome project that is part of the Exascale Computing Project is developing high performance tools for analyzing microbial data, which is especially challenging as hundreds of species may be collected and sequenced in a single sample from a human, animal or environmental microbiome. I will give an overview of several high performance genomics analysis problems, including alignment, profiling, clustering, and assembly and describe some of the challenges and opportunities of mapping these to current petascale and future exascale architectures, including GPU-based systems. I will also describe some of the common computational patterns or “motifs” that inform parallelization strategies and can be useful in understanding architectural requirements, algorithmic approaches, and benchmarking current and future systems.

Biography: Katherine Yelick is the Robert S. Pepper Distinguished Professor of Electrical Engineering and Computer Sciences and the Associate Dean for Research in the Division of Computing, Data Science and Society (CDSS) at the University of California, Berkeley. She is also a Senior Advisor on Computing at Lawrence Berkeley National Laboratory. Her research is in high performance computing, programming systems, parallel algorithms, and computational genomics and she currently leads the ExaBiome projecton Exascale Solutions for Microbiome Analysis.

Yelick was Director of the National Energy Research Scientific Computing Center (NERSC) from 2008 to 2012 and the led the Computing Sciences Area at Berkeley Lab from 2010 through 2019, where she oversaw NERSC, the Energy Sciences Network (ESnet) and the Computational Research Division. She earned her Ph.D. in Electrical Engineering and Computer Science from MIT and has been a professor at UC Berkeley since 1991 with a joint research appointment at Berkeley Lab since 1996. Yelick is a member of the National Academy of Engineering and the American Academy of Arts and Sciences. She is a Fellow of the Association for Computing Machinery (ACM) and the American Association for the Advancement of Sciences (AAAS). She is a recipient of the ACM/IEEE Ken Kennedy award and the ACM-W Athena award.

Research Areas:
AI & Machine Learning, Computational Biology, Systems & Networking

Impact Areas:
Big Data

See other events that are part of the Fast Code 2020 - 2021.

Created by Julian J. Shun Email at Tuesday, July 28, 2020 at 2:20 PM.