Skip navigation
  •  Inicio
  • UDC 
    • Cómo depositar
    • Políticas do RUC
    • FAQ
    • Dereitos de Autor
    • Máis información en INFOguías UDC
  • Percorrer 
    • Comunidades
    • Buscar por:
    • Data de publicación
    • Autor
    • Título
    • Materia
  • Axuda
    • español
    • Gallegan
    • English
  • Acceder
  •  Galego 
    • Español
    • Galego
    • English
  
Ver ítem 
  •   RUC
  • Facultade de Informática
  • Investigación (FIC)
  • Ver ítem
  •   RUC
  • Facultade de Informática
  • Investigación (FIC)
  • Ver ítem
JavaScript is disabled for your browser. Some features of this site may not work without it.

Efficient Scan Operator Methods on a GPU

Thumbnail
Ver/abrir
Amor_Margarita_2014_Efficient_Scan_Operator_Methods_on_a_GPU.pdf - Versión aceptada (500.8Kb)
Use este enlace para citar
http://hdl.handle.net/2183/40781
Coleccións
  • Investigación (FIC) [1678]
Metadatos
Mostrar o rexistro completo do ítem
Título
Efficient Scan Operator Methods on a GPU
Autor(es)
Pérez Diéguez, Adrián
Amor, Margarita
Doallo, Ramón
Data
2014
Cita bibliográfica
A. P. Diéguez, M. Amor and R. Doallo, "Efficient Scan Operator Methods on a GPU," 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing, Paris, France, 2014, pp. 190-197, doi: 10.1109/SBAC-PAD.2014.23.
Resumo
[Abstract]: Current GPUs (Graphics Processing Units) offer high computational power at relatively low cost, nonetheless, this enhanced performance often comes at the expenses of flexibility and code complexity. Efficient GPU programming requires detailed knowledge on certain hardware aspects. The scan operator is an important building block for a wide range of algorithms. In this paper, we present a number of parallel scan methods based on the traditional cyclic reduction tridiagonal solver and the Ladner-Fischer parallel prefix adder. Futhermore, we analyze a set of new features introduced in the Kepler Nvidia architecture such as read-only data cache and shuffle instructions. Our methods provide an excellent performance in many cases, up to 48% improvement over the CUDA Data Parallel Primitives (CUDPP) library.
Palabras chave
Instruction sets
Proposals
Graphics processing units
Kernel
Complexity theory
Arrays
Registers
 
Descrición
Presented at: 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing, Paris, France, 22-24 October 2014
 
This version of the article has been accepted for publication, after peer review. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The Version of Record is available online at: https://doi.org/10.1109/SBAC-PAD.2014.23
 
Versión do editor
https://doi.org/10.1109/SBAC-PAD.2014.23
Dereitos
© 2014 IEEE.
ISSN
1550-6533

Listar

Todo RUCComunidades e colecciónsPor data de publicaciónAutoresTítulosMateriasGrupo de InvestigaciónTitulaciónEsta colecciónPor data de publicaciónAutoresTítulosMateriasGrupo de InvestigaciónTitulación

A miña conta

AccederRexistro

Estatísticas

Ver Estatísticas de uso
Sherpa
OpenArchives
OAIster
Scholar Google
UNIVERSIDADE DA CORUÑA. Servizo de Biblioteca.    DSpace Software Copyright © 2002-2013 Duraspace - Suxestións