Adam J. Ferrari, Computer Science, University of Virginia
Abstract: In this talk I consider the problem of process state capture and recovery in high-performance heterogeneous distributed systems, sometimes called metasystems. A general-purpose process state capture and recovery mechanism for such an environment is most useful if it is platform-independent: process state captured on a host of one architecture or operating system should be recoverable on hosts supporting different architectures or operating systems. The central feature of my approach, called Process Introspection, is the automatic transformation of programs to incorporate state capture and recovery functionality. This program modification is performed at a platform-independent level of code representation, and preserves the original program semantics.
The attractive properties of this approach include ease of use and flexibility with respect to basic performance trade-offs and application-specific requirements. Furthermore, this solution meets the demanding portability and heterogeneity requirements of metacomputing environments: no external system support or non-portable code is required by the core mechanisms. Experimental results obtained using a prototype implementation indicate that Process Introspection can be automatically applied to computationally demanding scientific applications, introducing very low run-time overhead (typically below 10%), and providing efficient state capture and recovery service.
Short Bio: See the home page under Adam J. Ferrari.
Colloquia Home Page.