STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep Learning

López Castro, Roberto; Andrade, Diego; Fraguela, Basilio B.

dc.contributor.author	López Castro, Roberto
dc.contributor.author	Andrade, Diego
dc.contributor.author	Fraguela, Basilio B.
dc.date.accessioned	2024-06-05T14:30:12Z
dc.date.available	2024-06-05T14:30:12Z
dc.date.issued	2024-05
dc.identifier.citation	R. L. Castro, D. Andrade and B. B. Fraguela, "STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep Learning," in IEEE Access, vol. 12, pp. 70581-70599, 2024, doi: 10.1109/ACCESS.2024.3402326.	es_ES
dc.identifier.issn	2169-3536
dc.identifier.uri	http://hdl.handle.net/2183/36810
dc.description.abstract	[Abstract]: The relentless growth of modern Machine Learning models has spurred the adoption of sparsification techniques to simplify their architectures and reduce the computational demands. Network pruning has demonstrated success in maintaining original network accuracy while shedding significant portions of the original weights. However, leveraging this sparsity efficiently remains challenging due to computational irregularities, particularly in GPU kernels. A new trend of template-based GPU kernels for semi-structured sparsity shows promise in efficiency but lacks autotuning capabilities to adapt to input dynamics, often underperforming in scenarios where they have not been meticulously hand-tuned. We present STuning-DL, the first pruning-aware autotuner for third-party template-based implementations enabling efficient optimization of sparse kernels for Deep Learning, spanning from high-level aspects (CUDA C++ level) down to GPU-native instructions specifics (assembly-level). STuning-DL tunes and optimizes at run-time sparse kernels’ performance for each input problem, yielding speedups of up to 5.42× on NVIDIA T4-16GB and up to 3.6× on NVIDIA A100-40GB GPU in sparse matrices from real world models compared to existing heuristics from sparse libraries like cuSparse and cuSparseLt.	es_ES
dc.description.sponsorship	This work was supported by grant PID2022-136435NB-I00, funded by MCIN/AEI/10.13039/501100011033 and by ‘‘ERDF A way of making Europe’’, EU; also by Xunta de Galicia under the Consolidation Programme of Competitive Reference Groups, ref. ED431C 2021/30. The work of Roberto L. Castro was supported by a predoctoral grant from the Ministry of Science, Innovation and Universities, ref. FPU19/03974.	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431C 2021/30	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Institute of Electrical and Electronics Engineers	es_ES
dc.relation	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-136435NB-I00/ES/ARQUITECTURAS, FRAMEWORKS Y APLICACIONES DE LA COMPUTACION DE ALTAS PRESTACIONES	es_ES
dc.relation	info:eu-repo/grantAgreement//Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/FPU19%2F03974/ES/	es_ES
dc.relation.uri	https://doi.org/10.1109/ACCESS.2024.3402326	es_ES
dc.rights	Atribución 4.0 Internacional	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/es/	*
dc.subject	CUDA	es_ES
dc.subject	GPU	es_ES
dc.subject	Learning-based predictive model	es_ES
dc.subject	Network pruning	es_ES
dc.subject	Sparse computation	es_ES
dc.subject	SpMM	es_ES
dc.subject	Tensor Core	es_ES
dc.title	STuning-DL: Model-Driven Autotuning of Sparse GPU Kernels for Deep Learning	es_ES
dc.type	info:eu-repo/semantics/article	es_ES
dc.rights.access	info:eu-repo/semantics/openAccess	es_ES
UDC.journalTitle	IEEE Access	es_ES
UDC.volume	12	es_ES
UDC.startPage	70581	es_ES
UDC.endPage	70599	es_ES
dc.identifier.doi	10.1109/ACCESS.2024.3402326

Ficheiros no ítem

Nome:: Castro_RobertoL_2024_STuning-D ...
Tamaño:: 1.898Mb
Formato:: PDF

Ver/abrir

Nome:: license_rdf
Tamaño:: 1.337Kb
Formato:: application/rdf+xml

Ver/abrir

Este ítem aparece na(s) seguinte(s) colección(s)

GI-GAC - Artigos [189]

Mostrar o rexistro simple do ítem