Seminars & Colloquia
University of Illinois at Urbana-Champaign
"Diagnosing Production Failures with Better Logging Support"
Thursday April 19, 2012 09:00 AM
Location: 3211, EBII NCSU Centennial Campus
(Visitor parking instructions)
Computer systems often fail in production environment. As these failures directly affect the customers, large computer system vendors typically have to invest significant amounts of resources in diagnosing them. Unfortunately, diagnosing these production failures is notoriously difficult. Indeed, constrained by both privacy and expense reasons, software vendors often cannot reproduce such failures. Therefore, support engineers and developers continue to rely on the logs printed by the run time system to diagnose the production failures. However, the ad-hoc natures of today's system logs are frequently insufficient for effective failure diagnosis.
In this talk, I will describe our work on improving the software logging for better production failure diagnosis. One approach, LogEnhancer, uses a novel combination of program analysis and system techniques to collect additional information for each existing log message. Another approach, LogError, tackles the problem of 'silent failures' --- failures without any log messages printed. We applied LogEnhancer and LogError to a broad range of real software systems, and found that we can significantly improve the postmortem failure diagnosis by improved software logging. The insights we learnt could also benefit programmers towards better designs of their software for better failure diagnosability.
Ding Yuan is a Ph.D. candidate in the University of Illinois at Urbana-Champaign. He is also a visiting student in the University of California at San Diego. His research interests span the areas of systems, programming languages, compilers, and software engineering, with a focus on practical approaches for failure diagnosis. He has received two ASPLOS best paper nominees, an ACM SIGSOFT Distinguished Paper award, an Outstanding Teaching Assistant award, and a Saburo Muroga Fellowship. His research on failure diagnosis has been requested for release by large vendors including Cisco, EMC, Huawei, NetApp, Qualcomm, etc.
Host: Frank Mueller, CSC Dept