Mostrar o rexistro simple do ítem
Assessing resilient versus stop-and-restart fault-tolerant solutions in MPI applications
dc.contributor.author | Losada, Nuria | |
dc.contributor.author | Martín, María J. | |
dc.contributor.author | González, Patricia | |
dc.date.accessioned | 2018-07-10T17:18:53Z | |
dc.date.available | 2018-07-10T17:18:53Z | |
dc.date.issued | 2017-01 | |
dc.identifier.citation | Losada, N., Martín, M.J. & González, P. J Supercomput (2017) 73: 316. https://doi.org/10.1007/s11227-016-1863-z | es_ES |
dc.identifier.issn | 0920-8542 | |
dc.identifier.issn | 1573-0484 | |
dc.identifier.uri | http://hdl.handle.net/2183/20891 | |
dc.description | This is a post-peer-review, pre-copyedit version of an article published in Journal of Supercomputing. The final authenticated version is available online at: https://doi.org/10.1007/s11227-016-1863-z | es_ES |
dc.description.abstract | [Abstract] The Message Passing Interface (MPI) standard is the most popular parallel programming model for distributed systems. However, it lacks fault-tolerance support and, traditionally, failures are addressed with stop-and-restart checkpointing solutions. The proposal of User Level Failure Mitigation (ULFM) for the inclusion of resilience capabilities in the MPI standard provides new opportunities in this field, allowing the implementation of resilient MPI applications, i.e., applications that are able to detect and react to failures without stopping their execution. This work compares the performance of a traditional stop-and-restart checkpointing solution with its equivalent resilience proposal. Both approaches are built on top of ComPiler for Portable Checkpoiting (CPPC) an application-level checkpointing tool for MPI applications, and they allow to transparently obtain fault-tolerant MPI applications from generic MPI Single Program Multiple Data (SPMD). The evaluation is focused on the scalability of the two solutions, comparing both proposals using up to 3072 cores. | es_ES |
dc.description.sponsorship | Ministerio de Economía y Competitividad; TIN2013-42148-P | es_ES |
dc.description.sponsorship | Ministerio de Economía y Competitividad; BES-2014-068066 | es_ES |
dc.description.sponsorship | Galicia.Consellería de Cultura, Educación e Ordenación Universitaria; GRC2013/055 | es_ES |
dc.language.iso | eng | es_ES |
dc.publisher | Springer New York LLC | es_ES |
dc.relation.uri | https://doi.org/10.1007/s11227-016-1863-z | es_ES |
dc.subject | Resilience | es_ES |
dc.subject | Checkpointing | es_ES |
dc.subject | Fault tolerance | es_ES |
dc.subject | MPI | es_ES |
dc.title | Assessing resilient versus stop-and-restart fault-tolerant solutions in MPI applications | es_ES |
dc.type | info:eu-repo/semantics/article | es_ES |
dc.rights.access | info:eu-repo/semantics/openAccess | es_ES |
UDC.journalTitle | Journal of Supercomputing | es_ES |
UDC.volume | 73 | es_ES |
UDC.issue | 1 | es_ES |
UDC.startPage | 316 | es_ES |
UDC.endPage | 329 | es_ES |
dc.identifier.doi | 10.1007/s11227-016-1863-z |
Ficheiros no ítem
Este ítem aparece na(s) seguinte(s) colección(s)
-
GI-GAC - Artigos [193]