Buscar
Mostrando ítems 1-10 de 19
Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM routine on Ampere GPUs
(Institute of Electrical and Electronics Engineers, 2022)
[Abstract]: The Deep Learning (DL) community found in pruning techniques a good way to reduce the models' resource and energy consumption. These techniques lead to smaller sparse models, but sparse computations in GPUs ...
Automated and accurate cache behavior analysis for codes with irregular access patterns
(John Wiley & Sons Ltd., 2007-04-03)
[Abstract] The memory hierarchy plays an essential role in the performance of current computers, so good analysis tools that help in predicting and understanding its behavior are required. Analytical modeling is the ideal ...
Using Artificial Vision Techniques for Individual Player Tracking in Sport Events
(M D P I AG, 2019-07-31)
[Abstract] We introduce a hybrid approach that can track an individual football player in a video sequence. This solution achieves a good balance between speed and accuracy, combining traditional object tracking techniques ...
The New UPC++ DepSpawn High Performance Library for Data-Flow Computing with Hybrid Parallelism
(Springer, 2022)
[Abstract] Data-flow computing is a natural and convenient paradigm for expressing parallelism. This is particularly true for tools that automatically extract the data dependencies among the tasks while allowing to exploit ...
A Software Cache Autotuning Strategy for Dataflow Computing with UPC++ DepSpawn
(Wiley, 2021)
[Abstract] Dataflow computing allows to start computations as soon as all their dependencies are satisfied. This is particularly useful in applications with irregular or complex patterns of dependencies which would otherwise ...
Developing adaptive multi-device applications with the Heterogeneous Programming Library
(Springer, 2015)
[Abstract] The usage of heterogeneous devices presents two main problems. One is their complex programming, a problem that grows when multiple devices are used. The second issue is that even if the codes for these devices ...
High Productivity Multi-device Exploitation with the Heterogeneous Programming Library
(Elsevier, 2016)
[Abstract] Heterogeneous devices require much more work from programmers than traditional CPUs, particularly when there are several of them, as each one has its own memory space. Multidevice applications require to distribute ...
STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep Learning
(Institute of Electrical and Electronics Engineers, 2024-05)
[Abstract]: The relentless growth of modern Machine Learning models has spurred the adoption of sparsification techniques to simplify their architectures and reduce the computational demands. Network pruning has demonstrated ...
High-performance dataflow computing in hybrid memory systems with UPC++ DepSpawn
(Springer, 2021)
[Abstract]: Dataflow computing is a very attractive paradigm for high-performance computing, given its ability to trigger computations as soon as their inputs are available. UPC++ DepSpawn is a novel task-based library ...
Easy Dataflow Programming in Clusters with UPC++ DepSpawn
(Institute of Electrical and Electronics Engineers, 2019-06-01)
[Abstract]: The Partitioned Global Address Space (PGAS) programming model is one of the most relevant proposals to improve the ability of developers to exploit distributed memory systems. However, despite its important ...