Improving Scalability of Application-Level Checkpoint-Recovery by Reducing Checkpoint Sizes
| UDC.coleccion | Investigación | es_ES |
| UDC.departamento | Enxeñaría de Computadores | es_ES |
| UDC.endPage | 185 | es_ES |
| UDC.grupoInv | Grupo de Arquitectura de Computadores (GAC) | es_ES |
| UDC.issue | 3 | es_ES |
| UDC.journalTitle | New Generation Computing | es_ES |
| UDC.startPage | 163 | es_ES |
| UDC.volume | 31 | es_ES |
| dc.contributor.author | Cores González, Iván | |
| dc.contributor.author | Rodríguez, Gabriel | |
| dc.contributor.author | Martín, María J. | |
| dc.contributor.author | González, Patricia | |
| dc.contributor.author | Osorio, Roberto | |
| dc.date.accessioned | 2018-08-06T10:18:29Z | |
| dc.date.available | 2018-08-06T10:18:29Z | |
| dc.date.issued | 2013 | |
| dc.description | This is a post-peer-review, pre-copyedit version of an article published in New Generation Computing. The final authenticated version is available online at: https://doi.org/10.1007/s00354-013-0302-4 | es_ES |
| dc.description.abstract | [Abstract] The execution times of large-scale parallel applications on nowadays multi/many-core systems are usually longer than the mean time between failures. Therefore, parallel applications must tolerate hardware failures to ensure that not all computation done is lost on machine failures. Checkpointing and rollback recovery is one of the most popular techniques to implement fault-tolerant applications. However, checkpointing parallel applications is expensive in terms of computing time, network utilization and storage resources. Thus, current checkpoint-recovery techniques should minimize these costs in order to be useful for large scale systems. In this paper three different and complementary techniques to reduce the size of the checkpoints generated by application-level checkpointing are proposed and implemented. Detailed experimental results obtained on a multicore cluster show the effectiveness of the proposed methods to reduce checkpointing cost. | es_ES |
| dc.description.sponsorship | Ministerio de Ciencia e Innovación; TIN2010-16735 | es_ES |
| dc.description.sponsorship | Galicia. Consellería de Economía e Industria; 10PXIB105180PR | es_ES |
| dc.identifier.citation | Cores, I., Rodríguez, G., martín, M.J. et al. New Gener. Comput. (2013) 31: 163. https://doi.org/10.1007/s00354-013-0302-4 | es_ES |
| dc.identifier.doi | 10.1007/s00354-013-0302-4 | |
| dc.identifier.issn | 0288-3635 | |
| dc.identifier.issn | 1882-7055 | |
| dc.identifier.uri | http://hdl.handle.net/2183/20945 | |
| dc.language.iso | eng | es_ES |
| dc.publisher | Springer Japan KK | es_ES |
| dc.relation.uri | https://doi.org/10.1007/s00354-013-0302-4 | es_ES |
| dc.rights.accessRights | open access | es_ES |
| dc.subject | Parallel programming | es_ES |
| dc.subject | Message passing | es_ES |
| dc.subject | MPI | es_ES |
| dc.subject | Fault tolerance | es_ES |
| dc.subject | Checkpointing | es_ES |
| dc.title | Improving Scalability of Application-Level Checkpoint-Recovery by Reducing Checkpoint Sizes | es_ES |
| dc.type | journal article | es_ES |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | 040e0007-80e8-4213-b049-be346ac2b018 | |
| relation.isAuthorOfPublication | e432b4b1-5ead-41aa-b165-d69608b06626 | |
| relation.isAuthorOfPublication | 049797cb-6695-43ea-8f32-efc754fbfda6 | |
| relation.isAuthorOfPublication | 0ed2a744-9046-4c62-8300-a17ef95bea86 | |
| relation.isAuthorOfPublication | eac2943b-5be2-46e9-9816-09ae10df6b76 | |
| relation.isAuthorOfPublication.latestForDiscovery | 040e0007-80e8-4213-b049-be346ac2b018 |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- I.Cores_Improving_Scalability_of_Application-Level_Checkpoint-Recovery_by_Reducing_Checkpoint_Sizes_2013.pdf
- Size:
- 589.92 KB
- Format:
- Adobe Portable Document Format
- Description:

