Improving Scalability of Application-Level Checkpoint-Recovery by Reducing Checkpoint Sizes

Cores González, Iván; Rodríguez, Gabriel; Martín, María J.; González, Patricia; Osorio, Roberto

dc.contributor.author	Cores González, Iván
dc.contributor.author	Rodríguez, Gabriel
dc.contributor.author	Martín, María J.
dc.contributor.author	González, Patricia
dc.contributor.author	Osorio, Roberto
dc.date.accessioned	2018-08-06T10:18:29Z
dc.date.available	2018-08-06T10:18:29Z
dc.date.issued	2013
dc.identifier.citation	Cores, I., Rodríguez, G., martín, M.J. et al. New Gener. Comput. (2013) 31: 163. https://doi.org/10.1007/s00354-013-0302-4	es_ES
dc.identifier.issn	0288-3635
dc.identifier.issn	1882-7055
dc.identifier.uri	http://hdl.handle.net/2183/20945
dc.description	This is a post-peer-review, pre-copyedit version of an article published in New Generation Computing. The final authenticated version is available online at: https://doi.org/10.1007/s00354-013-0302-4	es_ES
dc.description.abstract	[Abstract] The execution times of large-scale parallel applications on nowadays multi/many-core systems are usually longer than the mean time between failures. Therefore, parallel applications must tolerate hardware failures to ensure that not all computation done is lost on machine failures. Checkpointing and rollback recovery is one of the most popular techniques to implement fault-tolerant applications. However, checkpointing parallel applications is expensive in terms of computing time, network utilization and storage resources. Thus, current checkpoint-recovery techniques should minimize these costs in order to be useful for large scale systems. In this paper three different and complementary techniques to reduce the size of the checkpoints generated by application-level checkpointing are proposed and implemented. Detailed experimental results obtained on a multicore cluster show the effectiveness of the proposed methods to reduce checkpointing cost.	es_ES
dc.description.sponsorship	Ministerio de Ciencia e Innovación; TIN2010-16735	es_ES
dc.description.sponsorship	Galicia. Consellería de Economía e Industria; 10PXIB105180PR	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Springer Japan KK	es_ES
dc.relation.uri	https://doi.org/10.1007/s00354-013-0302-4	es_ES
dc.subject	Parallel programming	es_ES
dc.subject	Message passing	es_ES
dc.subject	MPI	es_ES
dc.subject	Fault tolerance	es_ES
dc.subject	Checkpointing	es_ES
dc.title	Improving Scalability of Application-Level Checkpoint-Recovery by Reducing Checkpoint Sizes	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.access	info:eu-repo/semantics/openAccess	es_ES
UDC.journalTitle	New Generation Computing	es_ES
UDC.volume	31	es_ES
UDC.issue	3	es_ES
UDC.startPage	163	es_ES
UDC.endPage	185	es_ES
dc.identifier.doi	10.1007/s00354-013-0302-4

Ficheiros no ítem

Nome:: I.Cores_Improving_Scalability_ ...
Tamaño:: 589.9Kb
Formato:: PDF

Ver/abrir

Este ítem aparece na(s) seguinte(s) colección(s)

GI-GAC - Artigos [193]

Mostrar o rexistro simple do ítem