Mostrar o rexistro simple do ítem

dc.contributor.authorLosada, Nuria
dc.contributor.authorMartín, María J.
dc.contributor.authorGonzález, Patricia
dc.date.accessioned2018-07-10T17:18:53Z
dc.date.available2018-07-10T17:18:53Z
dc.date.issued2017-01
dc.identifier.citationLosada, N., Martín, M.J. & González, P. J Supercomput (2017) 73: 316. https://doi.org/10.1007/s11227-016-1863-zes_ES
dc.identifier.issn0920-8542
dc.identifier.issn1573-0484
dc.identifier.urihttp://hdl.handle.net/2183/20891
dc.descriptionThis is a post-peer-review, pre-copyedit version of an article published in Journal of Supercomputing. The final authenticated version is available online at: https://doi.org/10.1007/s11227-016-1863-zes_ES
dc.description.abstract[Abstract] The Message Passing Interface (MPI) standard is the most popular parallel programming model for distributed systems. However, it lacks fault-tolerance support and, traditionally, failures are addressed with stop-and-restart checkpointing solutions. The proposal of User Level Failure Mitigation (ULFM) for the inclusion of resilience capabilities in the MPI standard provides new opportunities in this field, allowing the implementation of resilient MPI applications, i.e., applications that are able to detect and react to failures without stopping their execution. This work compares the performance of a traditional stop-and-restart checkpointing solution with its equivalent resilience proposal. Both approaches are built on top of ComPiler for Portable Checkpoiting (CPPC) an application-level checkpointing tool for MPI applications, and they allow to transparently obtain fault-tolerant MPI applications from generic MPI Single Program Multiple Data (SPMD). The evaluation is focused on the scalability of the two solutions, comparing both proposals using up to 3072 cores.es_ES
dc.description.sponsorshipMinisterio de Economía y Competitividad; TIN2013-42148-Pes_ES
dc.description.sponsorshipMinisterio de Economía y Competitividad; BES-2014-068066es_ES
dc.description.sponsorshipGalicia.Consellería de Cultura, Educación e Ordenación Universitaria; GRC2013/055es_ES
dc.language.isoenges_ES
dc.publisherSpringer New York LLCes_ES
dc.relation.urihttps://doi.org/10.1007/s11227-016-1863-zes_ES
dc.subjectResiliencees_ES
dc.subjectCheckpointinges_ES
dc.subjectFault tolerancees_ES
dc.subjectMPIes_ES
dc.titleAssessing resilient versus stop-and-restart fault-tolerant solutions in MPI applicationses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.rights.accessinfo:eu-repo/semantics/openAccesses_ES
UDC.journalTitleJournal of Supercomputinges_ES
UDC.volume73es_ES
UDC.issue1es_ES
UDC.startPage316es_ES
UDC.endPage329es_ES
dc.identifier.doi10.1007/s11227-016-1863-z


Ficheiros no ítem

Thumbnail

Este ítem aparece na(s) seguinte(s) colección(s)

Mostrar o rexistro simple do ítem