In-memory application-level checkpoint-based migration for MPI programs

Bibliographic citation

Cores, I., Rodríguez, G., Martín, M.J. et al. J Supercomput (2014) 70: 660. https://doi.org/10.1007/s11227-014-1120-2

Type of academic work

Academic degree

Abstract

[Abstract] Process migration provides many benefits for parallel environments including dynamic load balancing, data access locality or fault tolerance. This paper describes an in-memory application-level checkpoint-based migration solution for MPI codes that uses the Hierarchical Data Format 5 (HDF5) to write the checkpoint files. The main features of the proposed solution are transparency for the user, achieved through the use of CPPC (ComPiler for Portable Checkpointing); portability, as the application-level approach makes the solution adequate for any MPI implementation and operating system, and the use of the HDF5 file format enables the restart on different architectures; and high performance, by saving the checkpoint files to memory instead of to disk through the use of the HDF5 in-memory files. Experimental results prove that the in-memory approach reduces significantly the I/O cost of the migration process.

Description

This is a post-peer-review, pre-copyedit version of an article published in Journal of Supercomputing. The final authenticated version is available online at: https://doi.org/10.1007/s11227-014-1120-2

Rights