Buscar

Mostrando ítems 1-3 de 3

Improving Scalability of Application-Level Checkpoint-Recovery by Reducing Checkpoint Sizes

Cores González, Iván; Rodríguez, Gabriel; Martín, María J.; González, Patricia; Osorio, Roberto (Springer Japan KK, 2013)

[Abstract] The execution times of large-scale parallel applications on nowadays multi/many-core systems are usually longer than the mean time between failures. Therefore, parallel applications must tolerate hardware failures ...

In-memory application-level checkpoint-based migration for MPI programs

Cores González, Iván; Rodríguez, Gabriel; Martín, María J.; González, Patricia (Springer New York LLC, 2014)

[Abstract] Process migration provides many benefits for parallel environments including dynamic load balancing, data access locality or fault tolerance. This paper describes an in-memory application-level checkpoint-based ...

Failure Avoidance in MPI Applications Using an Application-Level Approach

Cores González, Iván; Rodríguez, Gabriel; González, Patricia; Martín, María J. (Oxford University Press, 2014)

[Abstract] Execution times of large-scale computational science and engineering parallel applications are usually longer than the mean-time-between-failures. For this reason, hardware failures must be tolerated by the ...

Buscar

Filtros

Improving Scalability of Application-Level Checkpoint-Recovery by Reducing Checkpoint Sizes

In-memory application-level checkpoint-based migration for MPI programs

Failure Avoidance in MPI Applications Using an Application-Level Approach