Seminars & Colloquia
Wayne State University
"Design and Implementation of Effective Key-Value Systems for Large-scale Data Centers"
Wednesday March 16, 2016 09:00 AM
Location: 3211, EBII NCSU Centennial Campus
(Visitor parking instructions)
Data management systems in large-scale data centers are designed for high performance, scalability, and reliability. They play important roles in supporting Internet-wide data-centric computing. An important design principle critical to their success is to design according to workload characteristics: the general-purpose, one-size-fits-all approach once used in small-scale systems is no longer cost-effective. Examples of modern, carefully engineered systems include Google’s GFS file system, Facebook’s Haystack photo storage, and Baidu’s Atlas cloud storage system.
In this talk we will describe how rigorous workload characterization is used to design and implement a key-value (KV) system for large-scale data centers. In collaboration with Facebook, our team collected week-long KV access traces from Facebook’s production Memcached system and systematically characterized the relevant workload characteristics. This study showed some distinct access patterns that have significant implications for the KV systems’ designs, such as that (1) very small KV items are widespread; (2) accesses are highly skewed towards a small set of hot keys in KV cache; and (3) access traffic can be highly dynamic with request traffic varying by a factor of two.
Using our understanding of real-world workloads we designed and implemented the high-performance and resource-efficient zExpander KV cache and the LSM-trie KV store system. We will detail how the two systems’ designs were motivated by the understanding of their targeted workloads. Evaluation results reveal substantially, sometimes dramatically, improved performance over other state-of-the-art systems. As an anecdotal example, the LSM-trie system can improve the read and write throughputs of Google’s LevelDB by up to 10 and 20 times, respectively. We will conclude with a brief overview of our on-going projects and future visions.
Dr. Song Jiang is currently an associate professor of the ECE department at Wayne State University. His research interests include system infrastructure for big data processing, such as file and storage systems and data management systems, as well as I/O systems for high-performance computing. He was a recipient of a 2009 US National Science Foundation (NSF) CAREER award and his research activities have been continuously supported by the NSF. He has served on numerous conference program committees and proposal review panels. He has been involved in projects at Facebook and Baidu as a collaborator for providing high-quality Internet-wide services based on big data, resulting in many significant publications at top-tier conferences. Dr. Jiang’s research has generated substantial impact in industry where several of his proposed algorithms for memory and storage management have been officially adopted into mainstream systems, including the Linux kernel, the NetBSD kernel, and the storage engine of MySQL. He received his B.S and M.S from the University of Science and Technology of China, and his Ph.D. in computer science from the College of William and Mary in 2004. From 2004 to 2006 he was a post-doctoral researcher at the Los Alamos National Laboratory where his research work was cited at the national level as a “success story” of the NNSA Laboratory Directed Research and Development program. More information about his research can be found at http://www.ece.eng.wayne.edu/~sjiang/.
Host: Dr. Frank Mueller, CSC