Seminars & Colloquia

Dominik Kempa

Johns Hopkins University

"How to store massive sequence collections in a searchable form"

Friday March 12, 2021 09:00 AM
Location: N/A, EB2 NCSU Centennial Campus
Zoom Meeting Info
(Visitor parking instructions)

 

Abstract: Compressed indexing is concerned with the design and construction of data structures to store massive sequences in space close to the size of compressed data, while simultaneously providing searching functionality (such as pattern matching) on the original uncompressed data. These indexes have a huge impact on the analysis of large-scale DNA databases (containing hundreds of thousands of genomes) but until recently the size of many indexes lacked theoretical guarantees and their construction remains a computational bottleneck.

 

In this talk I will describe my work proving theoretical guarantees on index size as a function of compressed data size, resolving a key open question in this field. Related to that, I will also describe new algorithms for converting between two conceptually distinct compressed representations, Lempel-Ziv and the Burrows-Wheeler Transform. I will show how these new findings enable advanced computation directly on compressed data. I will conclude by describing avenues for future work, including the new algorithms for the construction of compressed indexes that have the potential to cut indexing time by further orders of magnitude, unlocking the ability to search truly enormous text or DNA datasets.

 

Short Bio: Dominik Kempa is currently a Postdoctoral Fellow at the Johns Hopkins University. Prior to joining JHU, he was a Postdoctoral Scholar at the University of California in Berkeley. He obtained his PhD in Computer Science at the University of Helsinki.

His research addresses the problem of storing and searching terabyte-scale sequential datasets and is achieved by the combination of information theory, data compression, and combinatorial pattern matching. His work combines design & analysis of algorithms with engineering and implementation of practical tools. He has published nearly 40 papers including at important theory (STOC, FOCS, SODA) and experimental (ACM JEA, ALENEX, SEA) venues. His PhD thesis received the Outstanding Doctoral Dissertation Award at the University of Helsinki and he was selected the Junior Researcher of the year in the Department of Computer Science. In 2019, he was an invited speaker at the Dagstuhl Seminar '25 Years of the Burrows-Wheeler Transform', and this year he will give a keynote talk at the Symposium on Experimental Algorithms (SEA 2021).

Host: Anthony Fortune-Linton, CSC


Back to Seminar Listings
Back to Colloquia Home Page