Listar por autor "Bosilca, George"
Mostrando ítems 1-2 de 2
-
Fault tolerance of MPI applications in exascale systems: The ULFM solution
Losada, Nuria; González, Patricia; Martín, María J.; Bosilca, George; Bouteiller, Aurelien; Teranishi, Keita (Elsevier BV * North-Holland, 2020-05)[Abstract] The growth in the number of computational resources used by high-performance computing (HPC) systems leads to an increase in failure rates. Fault-tolerant techniques will become essential for long-running ... -
Local Rollback for Resilient Mpi Applications With Application-Level Checkpointing and Message Logging
Losada, Nuria; Bosilca, George; Bouteiller, Aurelien; González, Patricia; Martín, María J. (Elsevier BV * North-Holland, 2019-02)[Abstract] The resilience approach generally used in high-performance computing (HPC) relies on coordinated checkpoint/restart, a global rollback of all the processes that are running the application. However, in many ...