Buscar

Mostrando ítems 1-2 de 2

Improving Scalability of Application-Level Checkpoint-Recovery by Reducing Checkpoint Sizes

Cores González, Iván; Rodríguez, Gabriel; Martín, María J.; González, Patricia; Osorio, Roberto (Springer Japan KK, 2013)

[Abstract] The execution times of large-scale parallel applications on nowadays multi/many-core systems are usually longer than the mean time between failures. Therefore, parallel applications must tolerate hardware failures ...

Compiler-Assisted Checkpointing of Parallel Codes: The Cetus and LLVM Experience

Rodríguez, Gabriel; Martín, María J.; González, Patricia; Touriño, Juan; Doallo, Ramón (Springer New York LLC, 2013)

[Abstract] With the evolution of high-performance computing, parallel applications have developed an increasing necessity for fault tolerance, most commonly provided by checkpoint and restart techniques. Checkpointing tools ...