STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep Learning
![Thumbnail](/dspace/bitstream/handle/2183/36810/Castro_RobertoL_2024_STuning-DL_Model-Driven_Autotuning_of_Sparse_GPU_Kernels_for_Deep_Learning.pdf.jpg?sequence=5&isAllowed=y)
Use este enlace para citar
http://hdl.handle.net/2183/36810
A non ser que se indique outra cousa, a licenza do ítem descríbese como Atribución 4.0 Internacional
Coleccións
- GI-GAC - Artigos [187]
Metadatos
Mostrar o rexistro completo do ítemTítulo
STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep LearningData
2024-05Cita bibliográfica
R. L. Castro, D. Andrade and B. B. Fraguela, "STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep Learning," in IEEE Access, vol. 12, pp. 70581-70599, 2024, doi: 10.1109/ACCESS.2024.3402326.
Resumo
[Abstract]: The relentless growth of modern Machine Learning models has spurred the adoption of sparsification techniques to simplify their architectures and reduce the computational demands. Network pruning has demonstrated success in maintaining original network accuracy while shedding significant portions of the original weights. However, leveraging this sparsity efficiently remains challenging due to computational irregularities, particularly in GPU kernels. A new trend of template-based GPU kernels for semi-structured sparsity shows promise in efficiency but lacks autotuning capabilities to adapt to input dynamics, often underperforming in scenarios where they have not been meticulously hand-tuned. We present STuning-DL, the first pruning-aware autotuner for third-party template-based implementations enabling efficient optimization of sparse kernels for Deep Learning, spanning from high-level aspects (CUDA C++ level) down to GPU-native instructions specifics (assembly-level). STuning-DL tunes and optimizes at run-time sparse kernels’ performance for each input problem, yielding speedups of up to 5.42× on NVIDIA T4-16GB and up to 3.6× on NVIDIA A100-40GB GPU in sparse matrices from real world models compared to existing heuristics from sparse libraries like cuSparse and cuSparseLt.
Palabras chave
CUDA
GPU
Learning-based predictive model
Network pruning
Sparse computation
SpMM
Tensor Core
GPU
Learning-based predictive model
Network pruning
Sparse computation
SpMM
Tensor Core
Versión do editor
Dereitos
Atribución 4.0 Internacional
ISSN
2169-3536