Designing Efficient Index-Digit Algorithms for CUDA GPU Architectures

Use this link to cite
http://hdl.handle.net/2183/40697Collections
- Investigación (FIC) [1615]
Metadata
Show full item recordTitle
Designing Efficient Index-Digit Algorithms for CUDA GPU ArchitecturesDate
2016-05Citation
J. Lobeiras, M. Amor and R. Doallo, "Designing Efficient Index-Digit Algorithms for CUDA GPU Architectures," in IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 5, pp. 1331-1343, 1 May 2016, doi: 10.1109/TPDS.2015.2450718.
Abstract
[Abstract]: Modern graphics processing units (GPUs) offer very high computing power at relatively low cost. Nevertheless, designing efficient algorithms for the GPUs normally requires additional time and effort, even for experienced programmers. In this work we present a tuning methodology that allows the design for CUDA-enabled GPU architectures of index-digit algorithms, that is, algorithms where the data movement can be described as the permutations of the digits comprising the indices of the data elements. This methodology, based on two-stages identified as GPU resource analysis and operators string manipulation, is applied to FFT and tridiagonal systems solver algorithms, analyzing the performance features and the most adequate solutions. The resulting implementation is compact and outperforms other well-known and commonly used state-of-the-art libraries, with an improvement of up to 19.2 percent over NVIDIA's complex CUFFT , and more than 3000 percent over the NVIDIA'sCUDPP for real data tridiagonal systems.
Keywords
CUDA
FFT
GPGPU
operators string
tridiagonal systems solver
tuning
FFT
GPGPU
operators string
tridiagonal systems solver
tuning
Description
This version of the article has been accepted for publication, after peer review. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The Version of Record is available online at: https://doi.org/10.1109/TPDS.2015.2450718.
Editor version
Rights
© 2016 IEEE.