MPI and UPC broadcast, scatter and gather algorithms in Xeon Phi

Mallón, Damián A.; Taboada, Guillermo L.; Koesterke, Lars

dc.contributor.author	Mallón, Damián A.
dc.contributor.author	Taboada, Guillermo L.
dc.contributor.author	Koesterke, Lars
dc.date.accessioned	2018-11-07T15:27:24Z
dc.date.available	2018-11-07T15:27:24Z
dc.date.issued	2016-05-06
dc.identifier.citation	Mallón, D. A., Taboada, G. L., & Koesterke, L. (2016). MPI and UPC broadcast, scatter and gather algorithms in Xeon Phi. Concurrency and Computation: Practice and Experience, 28(8), 2322-2340.	es_ES
dc.identifier.issn	1532-0626
dc.identifier.issn	1532-0634
dc.identifier.uri	http://hdl.handle.net/2183/21246
dc.description	This is the peer reviewed version of the following article: Mallón, D. A., Taboada, G. L., & Koesterke, L. (2016). MPI and UPC broadcast, scatter and gather algorithms in Xeon Phi. Concurrency and Computation: Practice and Experience, 28(8), 2322-2340, which has been published in final form at https://doi.org/10.1002/cpe.3552. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.	es_ES
dc.description.abstract	[Abstract] Accelerators have revolutionised the high performance computing (HPC) community. Despite their advantages, their very specific programming models and limited communication capabilities have kept them in a supporting role of the main processors. With the introduction of Xeon Phi, this is no longer true, as it can be programmed as the main processor and has direct access to the InfiniBand network adapter. Collective operations play a key role in many HPC applications. Therefore, studying its behaviour in the context of manycore coprocessors has great importance. This work analyses the performance of different algorithms for broadcast, scatter and gather, in a large‐scale Xeon Phi supercomputer. The algorithms evaluated are those available in the reference message passing interface (MPI) implementation for Xeon Phi (Intel MPI), the default algorithm in an optimised MPI implementation (MVAPICH2‐MIC), and a new set of algorithms, developed by the authors of this work, designed with modern processors and new communication features in mind. The latter are implemented in Unified Parallel C (UPC), a partitioned global address space language, leveraging one‐sided communications, hierarchical trees and message pipelining. This study scales the experiments to 15360 cores in the Stampede supercomputer and compares the results to Xeon and hybrid Xeon + Xeon Phi experiments, with up to 19456 cores.	es_ES
dc.description.sponsorship	National Science Foundation; OCI-1134872	es_ES
dc.language.iso	eng	es_ES
dc.publisher	John Wiley & Sons Ltd.	es_ES
dc.relation.uri	https://doi.org/10.1002/cpe.3552	es_ES
dc.subject	Collective operations	es_ES
dc.subject	Xeon Phi	es_ES
dc.subject	Manycore	es_ES
dc.subject	UPC	es_ES
dc.subject	MPI	es_ES
dc.subject	InfiniBand	es_ES
dc.title	MPI and UPC broadcast, scatter and gather algorithms in Xeon Phi	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.access	info:eu-repo/semantics/openAccess	es_ES
UDC.journalTitle	Concurrency and Computation: Practice & Experience	es_ES
UDC.volume	28	es_ES
UDC.issue	8	es_ES
UDC.startPage	2322	es_ES
UDC.endPage	2340	es_ES
dc.identifier.doi	10.1002/cpe.3552

Ficheiros no ítem

Nome:: D.A._Mallón_2016_MPI_and_UPC_B ...
Tamaño:: 3.637Mb
Formato:: PDF

Ver/abrir

Este ítem aparece na(s) seguinte(s) colección(s)

GI-GAC - Artigos [189]

Mostrar o rexistro simple do ítem