Buscar
Mostrando ítems 1-10 de 12
Resilient MPI applications using an application-level checkpointing framework and ULFM
(Springer New York LLC, 2017-01)
[Abstract] Future exascale systems, formed by millions of cores, will present high failure rates, and long-running applications will need to make use of new fault tolerance techniques to ensure successful execution completion. ...
Analysis of Performance-impacting Factors on Checkpointing Frameworks: The CPPC Case Study
(Oxford University Press, 2011-11-01)
[Abstract] This paper focuses on the performance evaluation of Compiler for Portable Checkpointing (CPPC), a tool for the checkpointing of parallel message-passing applications. Its performance and the factors that impact ...
Improving Scalability of Application-Level Checkpoint-Recovery by Reducing Checkpoint Sizes
(Springer Japan KK, 2013)
[Abstract] The execution times of large-scale parallel applications on nowadays multi/many-core systems are usually longer than the mean time between failures. Therefore, parallel applications must tolerate hardware failures ...
In-memory application-level checkpoint-based migration for MPI programs
(Springer New York LLC, 2014)
[Abstract] Process migration provides many benefits for parallel environments including dynamic load balancing, data access locality or fault tolerance. This paper describes an in-memory application-level checkpoint-based ...
CPPC: a compiler‐assisted tool for portable checkpointing of message‐passing applications
(John Wiley & Sons Ltd., 2010-11-19)
[Abstract] With the evolution of high‐performance computing toward heterogeneous, massively parallel systems, parallel applications have developed new checkpoint and restart necessities. Whether due to a failure in the ...
PARamrfinder: detecting allele-specific DNA methylation on multicore clusters
(Springer, 2024-01)
[Abstract]: The discovery of Allele-Specific Methylation (ASM) is an important research field in biology as it regulates genomic imprinting, which has been identified as the cause of some genetic diseases. Nevertheless, ...
Local Rollback for Resilient Mpi Applications With Application-Level Checkpointing and Message Logging
(Elsevier BV * North-Holland, 2019-02)
[Abstract]
The resilience approach generally used in high-performance computing (HPC) relies on coordinated checkpoint/restart, a global rollback of all the processes that are running the application. However, in many ...
Fault tolerance of MPI applications in exascale systems: The ULFM solution
(Elsevier BV * North-Holland, 2020-05)
[Abstract]
The growth in the number of computational resources used by high-performance computing (HPC) systems leads to an increase in failure rates. Fault-tolerant techniques will become essential for long-running ...
Fiuncho: a program for any-order epistasis detection in CPU clusters
(Springer, 2022)
[Abstract]: Epistasis can be defined as the statistical interaction of genes during the expression of a phenotype. It is believed that it plays a fundamental role in gene expression, as individual genetic variants have ...
Reducing the overhead of an MPI application-level migration approach
(Elsevier BV * North-Holland, 2016)
[Abstract] Process migration provides many benefits for parallel environments including dynamic load balance, data access locality, or fault tolerance. This work proposes a solution that reduces the memory and I/O overhead ...