Nagiza Samatova

Oak Ridge National Laboratory

"Towards Data Analytics at Petascale: Finding the Dots, Connecting the Dots, Understanding the Dots"

Thursday March 22, 2007 10:00 AM
Location: 3211, EB II NCSU Centennial Campus
Abstract: High-throughput experiments and ultrascale simulations in cosmology, astrophysics, climate, biology, and other domains revolutionize the way science is conducted. With this promise comes a problem: the answers to fundamental questions about the nature of the universe are subtly hidden in the enormous quantities of data routinely reaching tera- and peta-bytes. Computer science, mathematics and statistics are increasingly becoming an indispensable tool for scientists who aim to extract knowledge to understand the science in petascale data. The emerging petascale volume and inherent complexity of the data, however, challenge the applicability of existing techniques for harnessing and controlling the intricacy of Nature.

In this talk, I will highlight some of the needs for and our progress towards petascale data analytics that would enable application scientists to find informative features (identifying the dots) and to link them (connecting the dots) to formulate fundamental principles governing complex natural phenomena (understanding the dots). The difficulty lies in dealing with the millions of components and their billions of interactions on different spatial and temporal scales. Existing approaches are challenged by the curses of dimensionality, computational intractability, and noise. This talk will focus on how we could break these curses by exploiting their blessings through advanced theory, novel algorithms and their scalable implementations. Specifically, I will present our pioneering research in (a) fixed parameter tractability theory, (b) distributed and streaming data mining algorithms, and (c) an infrastructure for interactive and transparent parallel statistical computing with R. The application of these technologies to large-scale problems in astrophysics, climate, and biology will be presented.

Short Bio: Dr. Nagiza F. Samatova is a Senior Research Scientist in Computational Biology Institute, Computer Science and Mathematics Division of Oak Ridge National Laboratory. She received the B.S. degree in applied mathematics from Tashkent State University, Uzbekistan, in 1991, and two years later, at the age of 22, she defended her Ph.D. degree in mathematics from Russian Academy of Sciences, Moscow. She also obtained an M.S. degree in Computer Science in 1998 from the University of Tennessee, Knoxville, USA. Dr. Samatova expertise includes: (a) computational biology, (b) high performance data analytics, and (c) graph theory and algorithms. She is the author of over 75 publications, 1 book, and 2 patents.

Host: Xiaosong Ma, Computer Science

