Seminars & Colloquia

Mayank Goswami

Queens College, City University of New York

"Searching in the Age of Big Data: Filters, Nearest Neighbor Search, and the Role of Geometry"

Monday March 15, 2021 09:00 AM
Location: Zoom, EB2 NCSU Centennial Campus
Zoom Meeting Info
(Visitor parking instructions)

 

Abstract: Searching is a fundamental problem in computer science, which has gained particular importance in today’s age of big data. While most people are familiar with classical binary search trees, it may come as a surprise to some that we still do not completely understand them. In the first part of this talk I will describe the (still open) dynamic optimality conjecture, postulated by Sleator and Tarjan in 1985. Although a question on binary trees, it can be reformulated in geometric terms. This has enabled us to make progress on several fronts recently, some of which I will survey.

 

 

We will then move on to faster, hashing-based methods, particularly in the context of big-data. Most data today can’t be stored on a single machine, thus necessitating the need for a) succinct sketches to store on RAM, and b) I/O-efficient algorithms (also known as external memory algorithms). I will talk about several variants of Bloom Filters, a sketching-based data structure ubiquitous in databases today. While Bloom Filters solve the approximate membership problem (is the query in the database?), the related similarity search problem (is something similar to the query in the database?) has gained a lot of importance, and I will describe latest results on the development of such similarity filters. This is closely related to the k- nearest neighbor (kNN) problem and I will end with results on its I/O-complexity and applications to machine learning.

Short Bio: Mayank Goswami is an Associate Professor at Queens College in the City University of New York. He received his Ph.D. in Applied Mathematics and Statistics from Stony Brook University in 2013. From 2013 to 2016, he was a researcher at the Max-Planck Institute for Informatics in Germany.His research expertise is in mathematical foundations of big data, with a focus on theoretical guarantees for space and time efficient algorithms. His major work is on membership and similarity-search, with particular emphasis on filters for near-neighbor search in high dimensions. His other research interests include computational geometry, wireless networks, machine learning, and complex geometry. His research is supported by CRII mini-CAREER and AF Small awards from the National Science Foundation.

Host: Anthony Fortune-Linton, CSC


Back to Seminar Listings
Back to Colloquia Home Page