Buscar
Mostrando ítems 1-10 de 38
Probing the Efficacy of Hardware-Aware Weight Pruning to Optimize the SpMM routine on Ampere GPUs
(Institute of Electrical and Electronics Engineers, 2022)
[Abstract]: The Deep Learning (DL) community found in pruning techniques a good way to reduce the models' resource and energy consumption. These techniques lead to smaller sparse models, but sparse computations in GPUs ...
Automated and accurate cache behavior analysis for codes with irregular access patterns
(John Wiley & Sons Ltd., 2007-04-03)
[Abstract] The memory hierarchy plays an essential role in the performance of current computers, so good analysis tools that help in predicting and understanding its behavior are required. Analytical modeling is the ideal ...
The New UPC++ DepSpawn High Performance Library for Data-Flow Computing with Hybrid Parallelism
(Springer, 2022)
[Abstract] Data-flow computing is a natural and convenient paradigm for expressing parallelism. This is particularly true for tools that automatically extract the data dependencies among the tasks while allowing to exploit ...
A Software Cache Autotuning Strategy for Dataflow Computing with UPC++ DepSpawn
(Wiley, 2021)
[Abstract] Dataflow computing allows to start computations as soon as all their dependencies are satisfied. This is particularly useful in applications with irregular or complex patterns of dependencies which would otherwise ...
GPU Accelerated Molecular Docking Simulation with Genetic Algorithms
(Springer, Cham, 2016)
[Abstract] Receptor-Ligand Molecular Docking is a very computationally expensive process used to predict possible drug candidates for many diseases. A faster docking technique would help life scientists to discover better ...
Accelerating the HyperLogLog Cardinality Estimation Algorithm
(Hindawi, 2017)
[Abstract] In recent years, vast amounts of data of different kinds, from pictures and videos from our cameras to software logs from sensor networks and Internet routers operating day and night, are being generated. This ...
Developing adaptive multi-device applications with the Heterogeneous Programming Library
(Springer, 2015)
[Abstract] The usage of heterogeneous devices presents two main problems. One is their complex programming, a problem that grows when multiple devices are used. The second issue is that even if the codes for these devices ...
A general and efficient divide-and-conquer algorithm framework for multi-core clusters
(SpringerLink, 2017)
[Abstract]Divide-and-conquer is one of the most important patterns of parallelism, being applicable to a large variety of problems. In addition, the most powerful parallel systems available nowadays are computer clusters ...
A framework for argument-based task synchronization with automatic detection of dependencies
(Elsevier, 2013)
[Abstract] Synchronization in parallel applications can be achieved either implicitly or explicitly. Implicit synchronization is typical of programming environments that provide predefined, and often simple, patterns of ...
High Productivity Multi-device Exploitation with the Heterogeneous Programming Library
(Elsevier, 2016)
[Abstract] Heterogeneous devices require much more work from programmers than traditional CPUs, particularly when there are several of them, as each one has its own memory space. Multidevice applications require to distribute ...