Buscar
Mostrando ítems 1-10 de 18
Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM routine on Ampere GPUs
(Institute of Electrical and Electronics Engineers, 2022)
[Abstract]: The Deep Learning (DL) community found in pruning techniques a good way to reduce the models' resource and energy consumption. These techniques lead to smaller sparse models, but sparse computations in GPUs ...
Automated and accurate cache behavior analysis for codes with irregular access patterns
(John Wiley & Sons Ltd., 2007-04-03)
[Abstract] The memory hierarchy plays an essential role in the performance of current computers, so good analysis tools that help in predicting and understanding its behavior are required. Analytical modeling is the ideal ...
Using Artificial Vision Techniques for Individual Player Tracking in Sport Events
(M D P I AG, 2019-07-31)
[Abstract] We introduce a hybrid approach that can track an individual football player in a video sequence. This solution achieves a good balance between speed and accuracy, combining traditional object tracking techniques ...
ScalaParBiBit: Scaling the Binary Biclustering in Distributed-Memory Systems
(SpringerLink, 2021-03-19)
[Abstract] Biclustering is a data mining technique that allows us to find groups of rows and columns that are highly correlated in a 2D dataset. Although there exist several software applications to perform biclustering, ...
OpenCNN: A Winograd Minimal Filtering Algorithm Implementation in CUDA
(MDPI, 2021)
[Abstract] Improving the performance of the convolution operation has become a key target for High Performance Computing (HPC) developers due to its prevalence in deep learning applied mainly to video processing. The ...
Facilitating the development of stencil applications using the Heterogeneous Programming Library
(2017)
[Abstract] Stencil computations are very common in scientific codes. Heterogeneous systems achieve good results solving these problems, but their programming is complex because of the ghost regions required in multi-device ...
High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn
(Springer, 2021)
[Abstract]: Dataflow computing is a very attractive paradigm for high-performance computing, given its ability to trigger computations as soon as their inputs are available. UPC++ DepSpawn is a novel task-based library ...
A Software Cache Autotuning Strategy for Dataflow Computing with UPC++ DepSpawn
(Wiley, 2021)
[Abstract] Dataflow computing allows to start computations as soon as all their dependencies are satisfied. This is particularly useful in applications with irregular or complex patterns of dependencies which would otherwise ...
Developing adaptive multi-device applications with the Heterogeneous Programming Library
(Springer, 2015)
[Abstract] The usage of heterogeneous devices presents two main problems. One is their complex programming, a problem that grows when multiple devices are used. The second issue is that even if the codes for these devices ...
High Productivity Multi-device Exploitation with the Heterogeneous Programming Library
(Elsevier, 2016)
[Abstract] Heterogeneous devices require much more work from programmers than traditional CPUs, particularly when there are several of them, as each one has its own memory space. Multidevice applications require to distribute ...