Buscar
Mostrando ítems 1-8 de 8
Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM routine on Ampere GPUs
(Institute of Electrical and Electronics Engineers, 2022)
[Abstract]: The Deep Learning (DL) community found in pruning techniques a good way to reduce the models' resource and energy consumption. These techniques lead to smaller sparse models, but sparse computations in GPUs ...
ScalaParBiBit: Scaling the Binary Biclustering in Distributed-Memory Systems
(SpringerLink, 2021-03-19)
[Abstract] Biclustering is a data mining technique that allows us to find groups of rows and columns that are highly correlated in a 2D dataset. Although there exist several software applications to perform biclustering, ...
OpenCNN: A Winograd Minimal Filtering Algorithm Implementation in CUDA
(MDPI, 2021)
[Abstract] Improving the performance of the convolution operation has become a key target for High Performance Computing (HPC) developers due to its prevalence in deep learning applied mainly to video processing. The ...
High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn
(Springer, 2021)
[Abstract]: Dataflow computing is a very attractive paradigm for high-performance computing, given its ability to trigger computations as soon as their inputs are available. UPC++ DepSpawn is a novel task-based library ...
A Software Cache Autotuning Strategy for Dataflow Computing with UPC++ DepSpawn
(Wiley, 2021)
[Abstract] Dataflow computing allows to start computations as soon as all their dependencies are satisfied. This is particularly useful in applications with irregular or complex patterns of dependencies which would otherwise ...
An automatic optimizer for heterogeneous devices
(Elsevier, 2020-05)
[Abstract]: Codes written in a naive way seldom effectively exploit the computing resources, while writing optimized codes is usually a complex task that requires certain levels of expertise. This problem is further increased ...
The New UPC++ DepSpawn High Performance Library for Data-Flow Computing with Hybrid Parallelism
(Springer, 2022)
[Abstract] Data-flow computing is a natural and convenient paradigm for expressing parallelism. This is particularly true for tools that automatically extract the data dependencies among the tasks while allowing to exploit ...
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
(Association for Computing Machinery, 2023-11)
[Abstract]: The increasing success and scaling of Deep Learning models demands higher computational efficiency and power. Sparsification can lead to both smaller models as well as higher compute efficiency, and accelerated ...