Mostrar o rexistro simple do ítem

dc.contributor.authorCores González, Iván
dc.contributor.authorRodríguez, Gabriel
dc.contributor.authorGonzález, Patricia
dc.contributor.authorMartín, María J.
dc.date.accessioned2018-08-07T09:34:02Z
dc.date.available2018-08-07T09:34:02Z
dc.date.issued2014
dc.identifier.citationIván Cores, Gabriel Rodríguez, Patricia González, María J. Martín; Failure Avoidance in MPI Applications Using an Application-Level Approach, The Computer Journal, Volume 57, Issue 1, 1 January 2014, Pages 100–114, https://doi.org/10.1093/comjnl/bxs158es_ES
dc.identifier.issn0010-4620
dc.identifier.issn1460-2067
dc.identifier.urihttp://hdl.handle.net/2183/20947
dc.description.abstract[Abstract] Execution times of large-scale computational science and engineering parallel applications are usually longer than the mean-time-between-failures. For this reason, hardware failures must be tolerated by the applications to ensure that not all computation done is lost on machine failures. Checkpointing and rollback recovery is one of the most popular techniques to provide fault tolerance support to parallel applications. However, when a failure occurs, most checkpointing mechanisms require a complete restart of the parallel application from the last checkpoint. New advances in the prediction of hardware failures have led to the development of proactive process migration approaches, where tasks are migrated in a preventive way when node failures are anticipated, avoiding the restart of the whole application. The work presented in this paper extends an application-level checkpointing framework to proactively migrate message passing interface (MPI) processes when impending failures are notified, without having to restart the entire application. The main features of the proposed solution are: low overhead in failure-free executions, avoiding the checkpoint dumping associated to rolling back strategies; low overhead at migration time, by means of the design of a light and asynchronous protocol to achieve a consistent global state; transparency for the user, thanks to the use of a compiler tool and a runtime library and portability, as it is not locked into a particular architecture, operating system or MPI implementation.es_ES
dc.description.sponsorshipMinisterio de Ciencia e Innovación; TIN2010-16735es_ES
dc.description.sponsorshipGalicia. Consellería de Economía e Industria; 10PXIB105180PRes_ES
dc.language.isoenges_ES
dc.publisherOxford University Presses_ES
dc.relation.urihttps://doi.org/10.1093/comjnl/bxs158es_ES
dc.subjectFailure avoidancees_ES
dc.subjectProactive migrationes_ES
dc.subjectCheckpointinges_ES
dc.subjectMessage passinges_ES
dc.titleFailure Avoidance in MPI Applications Using an Application-Level Approaches_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.rights.accessinfo:eu-repo/semantics/openAccesses_ES
UDC.journalTitleThe Computer Journales_ES
UDC.volume57es_ES
UDC.issue1es_ES
UDC.startPage100es_ES
UDC.endPage114es_ES
dc.identifier.doi10.1093/comjnl/bxs158


Ficheiros no ítem

Thumbnail

Este ítem aparece na(s) seguinte(s) colección(s)

Mostrar o rexistro simple do ítem