Compiler-Assisted Checkpointing of Parallel Codes: The Cetus and LLVM Experience

Use this link to cite
http://hdl.handle.net/2183/20943Collections
- Investigación (FIC) [1618]
Metadata
Show full item recordTitle
Compiler-Assisted Checkpointing of Parallel Codes: The Cetus and LLVM ExperienceDate
2013Citation
Rodríguez, G., Martín, M.J., González, P. et al. Int J Parallel Prog (2013) 41: 782. https://doi.org/10.1007/s10766-012-0231-8
Abstract
[Abstract] With the evolution of high-performance computing, parallel applications have developed an increasing necessity for fault tolerance, most commonly provided by checkpoint and restart techniques. Checkpointing tools are typically implemented at one of two different abstraction levels: at the system level or at the application level. The latter has become an interesting alternative due to its flexibility and the possibility of operating in different environments. However, application-level checkpointing tools often require the user to manually insert checkpoints in order to ensure that certain requirements are met (e.g. forcing checkpoints to be taken at the user code and not inside kernel routines). This paper examines the transformations required to enable automatic checkpointing of parallel applications in the CPPC application-level checkpointing framework. These transformations have been implemented on two very different compiler infrastructures: Cetus and LLVM. Cetus is a Java-based compiler infrastructure aiming to provide an easy to use and clean IR and API for program transformation. LLVM is a low-level, SSA-based toolchain. The fundamental differences of both approaches are analyzed from the structural, behavioral and performance perspectives.
Keywords
Fault tolerance
Checkpointing
Parallel programming
Message passing
Compiler support
Cetus
LLVM
Checkpointing
Parallel programming
Message passing
Compiler support
Cetus
LLVM
Description
This is a post-peer-review, pre-copyedit version of an article published in International Journal of Parallel Programming. The final authenticated version is available online at: https://doi.org/10.1007/s10766-012-0231-8
Editor version
ISSN
0885-7458
1573-7640
1573-7640