Buscar

Mostrando ítems 1-5 de 5

Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM routine on Ampere GPUs

López Castro, Roberto; Andrade, Diego; Fraguela, Basilio B. (Institute of Electrical and Electronics Engineers, 2022)

[Abstract]: The Deep Learning (DL) community found in pruning techniques a good way to reduce the models' resource and energy consumption. These techniques lead to smaller sparse models, but sparse computations in GPUs ...

Efficient high-precision integer multiplication on the GPU

Pérez Diéguez, Adrián; Amor, Margarita; Doallo, Ramón; Nukada, Akira; Matsuoka, Satoshi (SAGE Journals, 2022-03)

[Abstract]: The multiplication of large integers, which has many applications in computer science, is an operation that can be expressed as a polynomial multiplication followed by a carry normalization. This work develops ...

OpenCNN: A Winograd Minimal Filtering Algorithm Implementation in CUDA

López Castro, Roberto; Andrade, Diego; Fraguela, Basilio B. (MDPI, 2021)

[Abstract] Improving the performance of the convolution operation has become a key target for High Performance Computing (HPC) developers due to its prevalence in deep learning applied mainly to video processing. The ...

CUDA-JMI: Acceleration of feature selection on heterogeneous systems

González-Domínguez, Jorge; Expósito, Roberto R.; Bolón-Canedo, Verónica (Elsevier, 2020-01)

[Abstract]: Feature selection is a crucial step nowadays in machine learning and data analytics to remove irrelevant and redundant characteristics and thus to provide fast and reliable analyses. Many research works have ...

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

López Castro, Roberto; Ivanov, Andrei; Andrade, Diego; Ben-Nun, Tal; Fraguela, Basilio B.; Hoefler, Torsten (Association for Computing Machinery, 2023-11)

[Abstract]: The increasing success and scaling of Deep Learning models demands higher computational efficiency and power. Sparsification can lead to both smaller models as well as higher compute efficiency, and accelerated ...