VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

López Castro, Roberto; Ivanov, Andrei; Andrade, Diego; Ben-Nun, Tal; Fraguela, Basilio B.; Hoefler, Torsten

dc.contributor.author	López Castro, Roberto
dc.contributor.author	Ivanov, Andrei
dc.contributor.author	Andrade, Diego
dc.contributor.author	Ben-Nun, Tal
dc.contributor.author	Fraguela, Basilio B.
dc.contributor.author	Hoefler, Torsten
dc.date.accessioned	2024-02-07T09:39:15Z
dc.date.available	2024-02-07T09:39:15Z
dc.date.issued	2023-11
dc.identifier.citation	Roberto L. Castro, Andrei Ivanov, Diego Andrade, Tal Ben-Nun, Basilio B. Fraguela, and Torsten Hoefler. 2023. VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '23). Association for Computing Machinery, New York, NY, USA, Article 72, 1–14. https://doi.org/10.1145/3581784.3607087	es_ES
dc.identifier.uri	http://hdl.handle.net/2183/35468
dc.description	© 2023 Autores \| ACM. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in International Conference for High Performance Computing, Networking, Storage and Analysis, https://doi.org/10.1145/3581784.3607087	es_ES
dc.description.abstract	[Abstract]: The increasing success and scaling of Deep Learning models demands higher computational efficiency and power. Sparsification can lead to both smaller models as well as higher compute efficiency, and accelerated hardware is becoming available. However, exploiting it efficiently requires kernel implementations, pruning algorithms, and storage formats, to utilize hardware support of specialized sparse vector units. An example of those are the NVIDIA's Sparse Tensor Cores (SPTCs), which promise a 2× speedup. However, SPTCs only support the 2:4 format, limiting achievable sparsity ratios to 50%. We present the V:N:M format, which enables the execution of arbitrary N:M ratios on SPTCs. To efficiently exploit the resulting format, we propose Spatha, a high-performance sparse-library for DL routines. We show that Spatha achieves up to 37× speedup over cuBLAS. We also demonstrate a second-order pruning technique that enables sparsification to high sparsity ratios with V:N:M and little to no loss in accuracy in modern transformers.	es_ES
dc.description.sponsorship	This research was supported by the Ministry of Science and Innovation of Spain (grants PID2019-104184RB-I00 and PID2022-136435NB-I00, funded by MCIN/AEI/ 10.13039/501100011033, PID2022 also funded by "ERDF A way of making Europe", EU), the Ministry of Education (predoctoral grant of Roberto L. Castro, FPU19/03974), by Xunta de Galicia under the Consolidation Program of Competitive Reference Groups (ED431C 2021/30), and ERC grant PSAP, no. 101002047. We also acknowledge the support from CITIC, funded by Xunta de Galicia and FEDER funds of the EU (Centro de Investigación de Galicia accreditation 2019-2022, ED431G 2019/01). Finally, we thank the Swiss National Supercomputing Center (CSCS) and the Centro de Supercomputación de Galicia (CESGA) for the use of their computers.	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431C 2021/30	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431G 2019/01	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Association for Computing Machinery	es_ES
dc.relation	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-104184RB-I00/ES/DESAFIOS ACTUALES EN HPC: ARQUITECTURAS, SOFTWARE Y APLICACIONES/	es_ES
dc.relation	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-136435NB-I00/ES/RESULTADOS DE INVESTIGACIÓN PROYECTOS ARQUITECTURAS, FRAMEWORKS Y APLICACIONES DE LA COMPUTACION DE ALTAS PRESTACIONES	es_ES
dc.relation	info:eu-repo/grantAgreement/MECD/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/FPU19%2F03974/ES/	es_ES
dc.relation	info:eu-repo/grantAgreement/EC/H2020/101002047	es_ES
dc.relation.isversionof	https://doi.org/10.1145/3581784.3607087
dc.relation.uri	https://doi.org/10.1145/3581784.3607087	es_ES
dc.rights	© 2023 Autores \| ACM.	es_ES
dc.subject	Sparse Tensor Cores	es_ES
dc.subject	GPU	es_ES
dc.subject	Pruning	es_ES
dc.subject	Sparsification	es_ES
dc.subject	CUDA	es_ES
dc.title	VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores	es_ES
dc.type	info:eu-repo/semantics/conferenceObject	es_ES
dc.type	info:eu-repo/semantics/conferenceObject	es_ES
dc.rights.access	info:eu-repo/semantics/openAccess	es_ES
UDC.issue	72	es_ES
UDC.startPage	1	es_ES
UDC.endPage	14	es_ES
dc.identifier.doi	10.1145/3581784.3607087
UDC.conferenceTitle	International Conference for High Performance Computing, Networking, Storage and Analysis	es_ES

Ficheiros no ítem

Nome:: LopezCastro_Roberto_2023_VENOM ...
Tamaño:: 1.260Mb
Formato:: PDF
Descrición:: Versión aceptada

Ver/abrir

Este ítem aparece na(s) seguinte(s) colección(s)

GI-GAC - Congresos, conferencias, etc. [53]
OpenAIRE [266]

Mostrar o rexistro simple do ítem