BPLG–BMCS: GPU-sorting algorithm using a tuning skeleton library

Ver/Abrir
Use este enlace para citar
http://hdl.handle.net/2183/20960Colecciones
- Investigación (FIC) [1678]
Metadatos
Mostrar el registro completo del ítemTítulo
BPLG–BMCS: GPU-sorting algorithm using a tuning skeleton libraryFecha
2017Cita bibliográfica
Diéguez, A.P., Amor, M. & Doallo, R. J Supercomput (2017) 73: 4. https://doi.org/10.1007/s11227-015-1591-9
Resumen
[Abstract] In this work, we present an efficient and portable sorting operator for GPUs. Specifically, we propose an algorithmic variant of the bitonic merge sort which reduces the number of processing stages and internal steps, increasing the workload per thread and focusing on a multi-batch execution for multiple problems of a small size. This proposal is well matched to current GPU architectures and we apply different CUDA optimizations to improve performance. For portability, we use a library based on tuning building blocks. Thanks to this parametrization, the library can easily be tuned for different CUDA GPU architectures. Our proposals obtain competitive performance on two recent NVIDIA GPU architectures, providing an improvement of up to 11,794 × over CUDPP and up to 6467 × over ModernGPU.
Palabras clave
GPUQ
CUDA
Tuning
Building blocks
Bitonic merge sort
CUDA
Tuning
Building blocks
Bitonic merge sort
Descripción
This is a post-peer-review, pre-copyedit version of an article published in Journal of Supercomputing. The final authenticated version is available online at: https://doi.org/10.1007/s11227-015-1591-9
Versión del editor
ISSN
0920-8542
1573-0484
1573-0484