Scalable PGAS collective operations in NUMA clusters

Mallón, Damián A.; Teijeiro Barjas, Carlos; González-Domínguez, Jorge; Taboada, Guillermo L.; Gómez, Andrés

dc.contributor.author	Mallón, Damián A.
dc.contributor.author	Teijeiro Barjas, Carlos
dc.contributor.author	González-Domínguez, Jorge
dc.contributor.author	Taboada, Guillermo L.
dc.contributor.author	Gómez, Andrés
dc.date.accessioned	2018-11-06T15:27:26Z
dc.date.available	2018-11-06T15:27:26Z
dc.date.issued	2014-12
dc.identifier.citation	Mallón, D.A., Taboada, G.L., Teijeiro, C. et al. Cluster Comput (2014) 17: 1473. https://doi.org/10.1007/s10586-014-0377-9	es_ES
dc.identifier.issn	1386-7857
dc.identifier.issn	1386-7857
dc.identifier.uri	http://hdl.handle.net/2183/21237
dc.description	This is a post-peer-review, pre-copyedit version of an article published in [insert journal title]. The final authenticated version is available online at: https://doi.org/10.1007/s10586-014-0377-9	es_ES
dc.description.abstract	[Abstract] The increasing number of cores per processor is turning manycore-based systems in pervasive. This involves dealing with multiple levels of memory in non uniform memory access (NUMA) systems and processor cores hierarchies, accessible via complex interconnects in order to dispatch the increasing amount of data required by the processing elements. The key for efficient and scalable provision of data is the use of collective communication operations that minimize the impact of bottlenecks. Leveraging one sided communications becomes more important in these systems, to avoid unnecessary synchronization between pairs of processes in collective operations implemented in terms of two sided point to point functions. This work proposes a series of algorithms that provide a good performance and scalability in collective operations, based on the use of hierarchical trees, overlapping one-sided communications, message pipelining and the available NUMA binding features. An implementation has been developed for Unified Parallel C, a Partitioned Global Address Space language, which presents a shared memory view across the nodes for programmability, while keeping private memory regions for performance. The performance evaluation of the proposed implementation, conducted on five representative systems (JuRoPA, JUDGE, Finis Terrae, SVG and Superdome), has shown generally good performance and scalability, even outperforming MPI in some cases, which confirms the suitability of the developed algorithms for manycore architectures.	es_ES
dc.description.sponsorship	Ministerio de Ciencia e Innovación; TIN2010-16735	es_ES
dc.description.sponsorship	Xunta de Galicia; CN2012/211	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Springer New York LLC	es_ES
dc.relation.uri	https://doi.org/10.1007/s10586-014-0377-9	es_ES
dc.subject	Manycore architectures	es_ES
dc.subject	Collective operations	es_ES
dc.subject	NUMA	es_ES
dc.subject	UPC	es_ES
dc.subject	PGAS	es_ES
dc.subject	MPI	es_ES
dc.subject	High performance computing	es_ES
dc.subject	Communication algorithms	es_ES
dc.title	Scalable PGAS collective operations in NUMA clusters	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.access	info:eu-repo/semantics/openAccess	es_ES
UDC.journalTitle	Cluster Computing: the journal of networks, software tools and applications	es_ES
UDC.volume	17	es_ES
UDC.issue	4	es_ES
UDC.startPage	1473	es_ES
UDC.endPage	1495	es_ES
dc.identifier.doi	10.1007/s10586-014-0377-9

Ficheiros no ítem

Nome:: D.A. Mallón 2014_Scalable PGAS ...
Tamaño:: 639.2Kb
Formato:: PDF

Ver/abrir

Este ítem aparece na(s) seguinte(s) colección(s)

GI-GAC - Artigos [181]

Mostrar o rexistro simple do ítem