Thesis Defense: Efficient Physiological Time Series Retrieval and Prediction via Locality-Sensitive Hashing

Speaker: Y. Bryce Kim , Anyscale Learning for All Group, CSAIL

Date: Tuesday, May 09, 2017

Time: 3:00 PM to 4:00 PM Note: all times are in the Eastern Time Zone

Public: Yes

Location: 32-D463 (Star)

Event Type:

Room Description:

Host: Una-May O'Reilly, Anyscale Learning for All Group, CSAIL

Contact: Bryce Kim,

Relevant URL:

Speaker URL: None

Speaker Photo:

Reminders to:

Reminder Subject: TALK: Thesis Defense: Efficient Physiological Time Series Retrieval and Prediction via Locality-Sensitive Hashing

Abstract: The amount of time series data collected in the medical community has recently been exploding due to widespread affordable sensors and storage devices. However, while it provides enormous opportunities for machine learning to make significant impacts, the massive repositories of such physiological time series data are largely untapped due to their granular detail and overwhelming size. Besides scale, fast yet accurate processing of physiological waveform data is desired in medical practice, especially in time-critical settings such as the intensive care unit (ICU). Efficiently leveraging these massive datasets is a key challenge that, when resolved, will support a new paradigm of scientific discovery and operational innovation in medicine.

In this thesis, we develop highly efficient similarity based methods that make it practical to search massive physiological time series repositories to rapidly identify waveforms similar to those from a given individual. We call this concept "patients with trajectories like mine." Our goal is to exploit rapid similar waveform retrieval to enable critical event prediction in the ICU setting. In order to achieve this goal, we propose to apply locality-sensitive hashing (LSH), which supports a very fast approximate nearest neighbor search in high dimensions. We empirically demonstrate that LSH based retrieval and prediction methods vastly speed up querying time while sacrificing only a trivial amount of accuracy as a cost.

Despite being fast and accurate, the generic LSH has two shortcomings. First, it is capable of utilizing only one similarity measure at a time. To overcome this limit, we introduce Stratified LSH (SLSH) which finds similarity among the data from a more integrated perspective by employing multiple distance metrics in one framework. SLSH is essentially a dual-level hierarchical LSH where each LSH layer is associated with a distinct distance metric capturing a unique facet of similarity. The second shortcoming and the main bottleneck of the generic LSH is that it involves exhaustive distance calculations as a subroutine when short-listing the candidate set to find the final nearest neighbors. To surmount this, we propose Collision Frequency LSH (CFLSH) which short-lists the candidate set by simply counting the frequency of collision based on the key idea that the more frequently an element and a query collide across multiple LSH hash tables, the more similar they are. We show that with SLSH and CFLSH, we improve the efficiency of LSH in terms of both prediction accuracy and querying speed.

We demonstrate our proposed methods on the mean arterial blood pressure dataset extracted from the MIMIC II database in the context of predicting acute hypotensive episodes in ICU. To examine the generality of our methods with respect to scaling, we validate our methods on datasets with various dimensions and item counts.

Research Areas:

Impact Areas:

This event is not part of a series.

Created by Bryce Kim Email at Thursday, May 04, 2017 at 3:24 PM.