"DD BOOST: A Distributed Deduplication API"

Abstract: How can you make a backup copy of a 10 Gigabyte file over a 1 Gigabyte/sec network link at a rate of 6 – 8 Gigabytes/sec without violating the speed of light or other laws of physics? And how can you store that backup copy on the storage server in as little as 100 Megabytes? DD BOOST is an Application Programming Interface (API) library that does just that by supporting distributed deduplication, with client systems performing segmenting and fingerprinting of file data and the deduplicating storage server system determining which data needs to be sent to the storage server. By suppressing the transmission of duplicate data, this results in dramatic increases in throughput compared to writing via NFS with a median reduction in network traffic of 90 – 99%. This can also reduce the amount of storage space needed to store backups by a factor of 10 – 100. We will describe how DD BOOST does all of this and some of our experiences from several years of customer usage.


Short Bio: The Dynamic Duo of Donna (“Bat Woman”) Lewis & Andy (“Batty Man”) Huber have both worked 6 years at Dell Technologies/EMC on various features of DD BOOST including interface design, integration with applications, performance, security and high availability. Before Dell Technologies they previously teamed up on software for embedded systems that did IPsec, IKE and other security protocols, and in their more distant paths worked for other leading computer companies such as IBM & Data General on things like network chip development and operating systems. Donna is a native Floridian and a graduate of Florida State University’s FSU/FAMU College of Engineering. She is a member of the Anita Borg Systers Community and a proud mother of two NCSU students. Andy is a Buckeye from Ohio with degrees from MIT. He is an ACM and IEEE Computer Society member and has been a mentor on several NCSU Senior Design Projects sponsored by Dell.

Host: David Sturgill, CSC

