Seminars & Colloquia
Department of Computer Science, Duke University
"MADDER and Self-Tuning Data Analytics on Hadoop with Starfish"
Friday April 27, 2012 11:00 AM
Location: 3211, EB2 NCSU Centennial Campus
(Visitor parking instructions)
This talk is part of the System Research Seminar series
Timely and cost-effective analytics over 'big data' is now a key ingredient for success in businesses and scientific disciplines. The Hadoop platform---consisting of an extensible MapReduce execution engine, pluggable distributed storage engines, and a range of procedural to declarative interfaces to express analysis tasks---is an emerging choice for big data analytics. Hadoop's performance out of the box can be poor, causing suboptimal use of resources, time, and money (e.g., in pay-as-you-go clouds). Unfortunately, practitioners of big data analytics such as business analysts, computational scientists, and researchers often lack the expertise to tune the Hadoop platform for good performance.
I will introduce Starfish, a self-tuning system for big data analytics. Starfish builds on Hadoop, while adapting to system workloads and user needs to provide good performance automatically; without any need for users to understand and manipulate the many tuning knobs in the Hadoop platform. While Starfish's design is guided by work on self-tuning database systems, I will discuss how new analysis practices (dubbed the MADDER principles) over big data pose new challenges; leading us to different design choices in Starfish. Starfish is under active development and is available at: http://www.cs.duke.edu/starfish
Shivnath Babu is an Assistant Professor of Computer Science at Duke University. He got his Ph.D. from Stanford University in 2005. He has received a U.S. National Science Foundation CAREER Award and three IBM Faculty Awards. His research interests are in ease-of-use and manageability of data-intensive computing systems, automated problem diagnosis and cluster sizing for systems running on cloud platforms, and automated detection and recovery from corruption of data caused by hardware faults, software bugs, or human mistakes.
Host: Xiaosong Ma, Computer Science