Buscar
Mostrando ítems 1-10 de 22
Resilient MPI applications using an application-level checkpointing framework and ULFM
(Springer New York LLC, 2017-01)
[Abstract] Future exascale systems, formed by millions of cores, will present high failure rates, and long-running applications will need to make use of new fault tolerance techniques to ensure successful execution completion. ...
Analysis of Performance-impacting Factors on Checkpointing Frameworks: The CPPC Case Study
(Oxford University Press, 2011-11-01)
[Abstract] This paper focuses on the performance evaluation of Compiler for Portable Checkpointing (CPPC), a tool for the checkpointing of parallel message-passing applications. Its performance and the factors that impact ...
Improving Scalability of Application-Level Checkpoint-Recovery by Reducing Checkpoint Sizes
(Springer Japan KK, 2013)
[Abstract] The execution times of large-scale parallel applications on nowadays multi/many-core systems are usually longer than the mean time between failures. Therefore, parallel applications must tolerate hardware failures ...
In-memory application-level checkpoint-based migration for MPI programs
(Springer New York LLC, 2014)
[Abstract] Process migration provides many benefits for parallel environments including dynamic load balancing, data access locality or fault tolerance. This paper describes an in-memory application-level checkpoint-based ...
Failure Avoidance in MPI Applications Using an Application-Level Approach
(Oxford University Press, 2014)
[Abstract] Execution times of large-scale computational science and engineering parallel applications are usually longer than the mean-time-between-failures. For this reason, hardware failures must be tolerated by the ...
CPPC: a compiler‐assisted tool for portable checkpointing of message‐passing applications
(John Wiley & Sons Ltd., 2010-11-19)
[Abstract] With the evolution of high‐performance computing toward heterogeneous, massively parallel systems, parallel applications have developed new checkpoint and restart necessities. Whether due to a failure in the ...
Compiler-Assisted Checkpointing of Parallel Codes: The Cetus and LLVM Experience
(Springer New York LLC, 2013)
[Abstract] With the evolution of high-performance computing, parallel applications have developed an increasing necessity for fault tolerance, most commonly provided by checkpoint and restart techniques. Checkpointing tools ...
On processing extreme data
(Universitatea de Vest din Timisoara,West University of Timisoara, 2016)
[Abstract] Extreme Data is an incarnation of Big Data concept distinguished by the massive amounts of data that must be queried, communicated and analyzed in near real-time by using a very large number of memory or storage ...
Extending an Application-Level Checkpointing Tool to Provide Fault Tolerance Support to OpenMP Applications
(Technische Universitaet Graz * Institut fuer Informationssysteme und Computer Medien,Graz University of Technology, Institute for Information Systems and Computer Media, 2014-09)
[Abstract] Despite the increasing popularity of shared-memory systems, there is a lack of tools for providing fault tolerance support to shared-memory applications. CPPC (ComPiler for Portable Checkpointing) is an ...
A Portable and Adaptable Fault Tolerance Solution for Heterogeneous Applications
(Academic Press, 2017-06)
[Abstract] Heterogeneous systems have increased their popularity in recent years due to the high performance and reduced energy consumption capabilities provided by using devices such as GPUs or Xeon Phi accelerators. This ...