Extending an Application-Level Checkpointing Tool to Provide Fault Tolerance Support to OpenMP Applications

Bibliographic citation

Losada, N., Martín, M. J., Rodríguez, G., & González, P. (2014). Extending an Application-Level Checkpointing Tool to Provide Fault Tolerance Support to OpenMP Applications. Journal of Universal Computer Science, 20(9), 1352-1372.

Type of academic work

Academic degree

Abstract

[Abstract] Despite the increasing popularity of shared-memory systems, there is a lack of tools for providing fault tolerance support to shared-memory applications. CPPC (ComPiler for Portable Checkpointing) is an application-level checkpointing tool focused on the insertion of fault tolerance into long-running MPI applications. This paper presents an extension to CPPC to allow the checkpointing of OpenMP applications. The proposed solution maintains the main characteristics of CPPC: portability and reduced checkpoint file size. The performance of the proposal is evaluated using the OpenMP NAS Parallel Benchmarks showing that most of the applications present small checkpoint overheads.

Description

Rights