Show simple item record

dc.contributor.authorLosada, Nuria
dc.contributor.authorCores González, Iván
dc.contributor.authorMartín, María J.
dc.contributor.authorGonzález, Patricia
dc.date.accessioned2018-07-10T14:29:26Z
dc.date.available2018-07-10T14:29:26Z
dc.date.issued2017-01
dc.identifier.citationLosada, N., Cores, I., Martín, M.J. et al. J Supercomput (2017) 73: 100. https://doi.org/10.1007/s11227-016-1629-7es_ES
dc.identifier.issn0920-8542
dc.identifier.issn1573-0484
dc.identifier.urihttp://hdl.handle.net/2183/20890
dc.descriptionThis is a post-peer-review, pre-copyedit version of an article published in Journal of Supercomputing. The final authenticated version is available online at: https://doi.org/10.1007/s11227-016-1629-7es_ES
dc.description.abstract[Abstract] Future exascale systems, formed by millions of cores, will present high failure rates, and long-running applications will need to make use of new fault tolerance techniques to ensure successful execution completion. The Fault Tolerance Working Group, within the MPI forum, has presented the User Level Failure Mitigation (ULFM) proposal, providing new functionalities for the implementation of resilient MPI applications. In this work, the CPPC checkpointing framework is extended to exploit the new ULFM functionalities. The proposed solution transparently obtains resilient MPI applications by instrumenting the original application code. Besides, a multithreaded multilevel checkpointing, in which the checkpoint files are saved in different memory levels, improves the scalability of the solution. The experimental evaluation shows a low overhead when tolerating failures in one or several MPI processes.es_ES
dc.description.sponsorshipMinisterio de Economía y Competitividad; TIN2013-42148-Pes_ES
dc.description.sponsorshipMinisterio de Economía y Competitividad; TIN2014-53522-REDTes_ES
dc.description.sponsorshipMinisterio de Economía y Competitividad; BES-2014-068066es_ES
dc.description.sponsorshipGalicia. Consellería de Cultura, Educación e Ordenación Universitaria; GRC2013/055es_ES
dc.language.isoenges_ES
dc.publisherSpringer New York LLCes_ES
dc.relation.urihttps://doi.org/10.1007/s11227-016-1629-7es_ES
dc.subjectResiliencees_ES
dc.subjectCheckpointinges_ES
dc.subjectFault tolerancees_ES
dc.subjectMPIes_ES
dc.titleResilient MPI applications using an application-level checkpointing framework and ULFMes_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.rights.accessinfo:eu-repo/semantics/openAccesses_ES
UDC.journalTitleJournal of Supercomputinges_ES
UDC.volume73es_ES
UDC.issue1es_ES
UDC.startPage100es_ES
UDC.endPage113es_ES
dc.identifier.doi10.1007/s11227-016-1629-7


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record