SPLG: A Tuned Signal Processing Library for GPU Architectures

Lobeiras Blanco, Jacobo; Amor, Margarita; Doallo, Ramón

doi:10.1109/SBAC-PAD.2013.30

Título

SPLG: A Tuned Signal Processing Library for GPU Architectures

Autor(es)

Lobeiras Blanco, Jacobo

Amor, Margarita

Doallo, Ramón

Fecha

2013

Cita bibliográfica

J. L. Blanco, M. Amor and R. Doallo, "SPLG: A Tuned Signal Processing Library for GPU Architectures," 2013 25th International Symposium on Computer Architecture and High Performance Computing, Porto de Galinhas, Brazil, 2013, pp. 184-191, doi: 10.1109/SBAC-PAD.2013.30

Resumen

[Abstract]: In order to increase the efficiency of existing software many works are incorporating GPU processing. However, despite the current advances in GPU languages and tools, taking advantage of their parallel architecture is still far more complex than programming standard multi-core CPUs. Performance profiling and analysis of known applications provides a useful insight of the hardware architecture and memory hierarchy. Afterwards, this analysis can be used to identify potential bottlenecks and tune other software so it can make a more efficient usage of the available resources. In this work we implement a small signal processing library which will be used to characterize the performance of most recent NVIDIA GPU architectures. The methodology used in our signal processing library is based on a series of building blocks that enable us to easily design several well-known algorithms with little effort. The library was built paying special attention to flexibility and adaptability. In this work we also show how a generic approach can be used to easily design these GPU algorithms while obtaining competitive performance, which results specially interesting from the productivity standpoint.

Palabras clave

CUDA
DCT
FFT
GPGPU
Hartley
Signal processing
Tuned library

Descripción

Presented at: 25th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2013, 23 through 26 October 2013

This version of the article has been accepted for publication, after peer review. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The Version of Record is available online at: https://doi.org/10.1109/SBAC-PAD.2013.30