Buscar
Mostrando ítems 1-10 de 39
Resilient MPI applications using an application-level checkpointing framework and ULFM
(Springer New York LLC, 2017-01)
[Abstract] Future exascale systems, formed by millions of cores, will present high failure rates, and long-running applications will need to make use of new fault tolerance techniques to ensure successful execution completion. ...
A Grid Portal for an Undergraduate Parallel Programming Course
(Institute of Electrical and Electronics Engineers, 2005-08)
[Abstract] This paper describes an experience of designing and implementing a portal to support transparent remote access to supercomputing facilities to students enrolled in an undergraduate parallel programming course. ...
Analysis of Performance-impacting Factors on Checkpointing Frameworks: The CPPC Case Study
(Oxford University Press, 2011-11-01)
[Abstract] This paper focuses on the performance evaluation of Compiler for Portable Checkpointing (CPPC), a tool for the checkpointing of parallel message-passing applications. Its performance and the factors that impact ...
A Heuristic Approach for the Automatic Insertion of Checkpoints in Message-Passing Codes
(Technische Universitaet Graz * Institut fuer Informationssysteme und Computer Medien,Graz University of Technology, Institute for Information Systems and Computer Media, 2009-08)
[Abstract] Checkpointing tools may be typically implemented at two different abstraction levels: at the system level or at the application level. The latter has become a more popular alternative due to its flexibility and ...
Improving Scalability of Application-Level Checkpoint-Recovery by Reducing Checkpoint Sizes
(Springer Japan KK, 2013)
[Abstract] The execution times of large-scale parallel applications on nowadays multi/many-core systems are usually longer than the mean time between failures. Therefore, parallel applications must tolerate hardware failures ...
Extending the Globus Information Service with the Common Information Model
(IEEE Computer Society, 2011-07-14)
[Abstract] The need of task-adapted and complete information for the management of resources is a well known issue in Grid computing. Globus Toolkit 4 (GT4) includes the Monitoring and Discovery System component (MDS4) to ...
Integrating the common information model with MDS4
(IEEE Computer Society, 2008-10-31)
[Abstract] The management and monitoring of static and dynamic resources is a key issue in grid environments. Information models are an abstract representation of software and hardware aspects of these resources, a common ...
Parallelization of ARACNe, an Algorithm for the Reconstruction of Gene Regulatory Networks
(M D P I AG, 2019-07-31)
[Abstract] Gene regulatory networks are graphical representations of molecular regulators that interact with each other and with other substances in the cell to govern the gene expression. There are different computational ...
Exploiting locality in the run-time parallelization of irregular loops
(C R C Press, LLC, 2002-12-10)
[Abstract] The goal of this work is the efficient parallel execution of loops with indirect array accesses, in order to be embedded in a parallelizing compiler framework. In this kind of loop pattern, dependences can not ...
In-memory application-level checkpoint-based migration for MPI programs
(Springer New York LLC, 2014)
[Abstract] Process migration provides many benefits for parallel environments including dynamic load balancing, data access locality or fault tolerance. This paper describes an in-memory application-level checkpoint-based ...