Seminars & Colloquia
The University of New Mexico
"Integrating Performance Modeling into Scalable System Software Design"
Thursday March 03, 2016 09:00 AM
Location: 3211, EBII NCSU Centennial Campus
(Visitor parking instructions)
Developing new scalable system software techniques is essential to the success of emerging large-scale scientific computing systems due to the increasing scale and complexity of hardware, programming systems, and applications. In particular, HPC operating systems and middleware must address challenges in areas such as fault tolerance, scheduling, synchronization, power management, and high-speed communication. Interactions between these areas also complicate software design; recent research has shown, for example, that both power capping and asynchronous check-pointing can have widely varying and hard-to predict impacts on system performance.
Because of these challenges, my research has increasing relied on performance modeling to expose research challenges, quantify performance tradeoffs, and evaluate the resulting system. This aspect of the research is challenging and rewarding because it requires understanding the underlying system, the strengths and limitations of different modeling approaches developed by the modeling community, and how to best integrate these techniques into system software design. In some cases, my students and I have been able to use simple analytical models; recently, however, we have recently been relying on more sophisticated stochastic modeling techniques. We have also begun exploring the viability of using large-scale computational models to inform the design of HPC system software.
In this talk, I discuss several systems research projects my students and I have conducted to meet HPC system software challenges in the areas of resilience, scheduling, and communication system design. In each of these areas, I describe both the research itself and how modeling techniques have informed the research. Finally, I will briefly discuss some new research directions we are currently exploring as well as provide some thoughts on the broader integration of modeling and evaluation in computer systems research and education.
Patrick G. Bridges is an Associate Professor and Associate Department Chair of the Computer Science Department at the University of New Mexico. His research focuses on system software for large scale computer systems, including research on operating system design and implementation, communication system optimization, and fault tolerance and resilience. Dr. Bridges received his Ph.D. from the University of Arizona in 2002 working under the direction of Professor Rick Schlichting, and his B.S. from Mississippi State University in 1994. He is a member of the Association of Computing Machinery and the IEEE Computer Society.
Host: Dr. Frank Mueller, CSC