Seminars & Colloquia

Martin Schulz

Lawrence Livermore National Laboratory

"Developing New Tool Strategies for Scalable HPC Systems"

Monday May 21, 2007 10:00 AM
Location: 3211, EB2 NCSU Centennial Campus
(Visitor parking instructions)

This talk is part of the System Research Seminar series

 

Abstract: Current high-end HPC cluster systems are starting to scale beyond the 10,000 processor mark and some systems, like Blue Gene/L, have already reached over 130,000. This trend towards higher processor counts will continue and lead to Peta-scale systems in the foreseeable future.

The increasing number of CPUs does not only have an impact on applications, but also on the development environment around them. In particular, tools need to be able to keep up with the applications' scalability. This includes collecting and analyzing data, finding the relevant information, and presenting it to the user in a comprehensible way.

In this talk I will show how we address these issues within the DOE/ASC (Advanced Simulation and Computing) program. In particular, I will focus on two tool sets we recently developed: STAT helps users gather and analyze distributed stack traces to quickly focus debugging efforts; and P^nMPI enables users to dynamically compose MPI tool chains from existing components as well as customize their scope. Both tool sets give us new insights into scalable systems and are part of a new generation of tools capable of providing efficient support for next generation HPC machines.

Short Bio: Martin Schulz is a Computer Scientist at the Center for Applied Scientific Computing (CASC) at Lawrence Livermore National Laboratory (LLNL). His research interests include parallel and distributed architectures and applications; performance monitoring, modeling, and analysis; memory system optimization; parallel programming paradigms; tool support for parallel programming; and fault tolerance at the application and system level.

Martin earned his doctorate in Computer Science in 2001 from the Technische Universitaet Muenchen (Munich, Germany). He also holds a Master of Science in Computer Science from the University of Illinois at Urbana Champaign. After completing his graduate studies and a postdoctoral appointment in Munich, he worked for two years as a Research Associate at Cornell University, before joining LLNL in 2004. Currently, his projects include the design and development of performance tools for large-scale parallel systems, in particular Open|SpeedShop, work on application and communication optimizations for BlueGene/L, the use of machine learning techniques for performance analysis and modeling techniques, as well as scalable debugging. He is a member of the ACM and the IEEE Computer Society.

Host: Frank Mueller, Computer Science, NCSU


Back to Seminar Listings
Back to Colloquia Home Page