Mostrar o rexistro simple do ítem

dc.contributor.authorMallón, Damián A.
dc.contributor.authorTeijeiro Barjas, Carlos
dc.contributor.authorGonzález-Domínguez, Jorge
dc.contributor.authorTaboada, Guillermo L.
dc.contributor.authorGómez, Andrés
dc.date.accessioned2018-11-06T15:27:26Z
dc.date.available2018-11-06T15:27:26Z
dc.date.issued2014-12
dc.identifier.citationMallón, D.A., Taboada, G.L., Teijeiro, C. et al. Cluster Comput (2014) 17: 1473. https://doi.org/10.1007/s10586-014-0377-9es_ES
dc.identifier.issn1386-7857
dc.identifier.issn1386-7857
dc.identifier.urihttp://hdl.handle.net/2183/21237
dc.descriptionThis is a post-peer-review, pre-copyedit version of an article published in [insert journal title]. The final authenticated version is available online at: https://doi.org/10.1007/s10586-014-0377-9es_ES
dc.description.abstract[Abstract] The increasing number of cores per processor is turning manycore-based systems in pervasive. This involves dealing with multiple levels of memory in non uniform memory access (NUMA) systems and processor cores hierarchies, accessible via complex interconnects in order to dispatch the increasing amount of data required by the processing elements. The key for efficient and scalable provision of data is the use of collective communication operations that minimize the impact of bottlenecks. Leveraging one sided communications becomes more important in these systems, to avoid unnecessary synchronization between pairs of processes in collective operations implemented in terms of two sided point to point functions. This work proposes a series of algorithms that provide a good performance and scalability in collective operations, based on the use of hierarchical trees, overlapping one-sided communications, message pipelining and the available NUMA binding features. An implementation has been developed for Unified Parallel C, a Partitioned Global Address Space language, which presents a shared memory view across the nodes for programmability, while keeping private memory regions for performance. The performance evaluation of the proposed implementation, conducted on five representative systems (JuRoPA, JUDGE, Finis Terrae, SVG and Superdome), has shown generally good performance and scalability, even outperforming MPI in some cases, which confirms the suitability of the developed algorithms for manycore architectures.es_ES
dc.description.sponsorshipMinisterio de Ciencia e Innovación; TIN2010-16735es_ES
dc.description.sponsorshipXunta de Galicia; CN2012/211es_ES
dc.language.isoenges_ES
dc.publisherSpringer New York LLCes_ES
dc.relation.urihttps://doi.org/10.1007/s10586-014-0377-9es_ES
dc.subjectManycore architectureses_ES
dc.subjectCollective operationses_ES
dc.subjectNUMAes_ES
dc.subjectUPCes_ES
dc.subjectPGASes_ES
dc.subjectMPIes_ES
dc.subjectHigh performance computinges_ES
dc.subjectCommunication algorithmses_ES
dc.titleScalable PGAS collective operations in NUMA clusterses_ES
dc.typeinfo:eu-repo/semantics/articlees_ES
dc.rights.accessinfo:eu-repo/semantics/openAccesses_ES
UDC.journalTitleCluster Computing: the journal of networks, software tools and applicationses_ES
UDC.volume17es_ES
UDC.issue4es_ES
UDC.startPage1473es_ES
UDC.endPage1495es_ES
dc.identifier.doi10.1007/s10586-014-0377-9


Ficheiros no ítem

Thumbnail

Este ítem aparece na(s) seguinte(s) colección(s)

Mostrar o rexistro simple do ítem