Mostrar o rexistro simple do ítem

dc.contributor.authorLópez Castro, Roberto
dc.contributor.authorIvanov, Andrei
dc.contributor.authorAndrade, Diego
dc.contributor.authorBen-Nun, Tal
dc.contributor.authorFraguela, Basilio B.
dc.contributor.authorHoefler, Torsten
dc.date.accessioned2024-02-07T09:39:15Z
dc.date.available2024-02-07T09:39:15Z
dc.date.issued2023-11
dc.identifier.citationRoberto L. Castro, Andrei Ivanov, Diego Andrade, Tal Ben-Nun, Basilio B. Fraguela, and Torsten Hoefler. 2023. VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '23). Association for Computing Machinery, New York, NY, USA, Article 72, 1–14. https://doi.org/10.1145/3581784.3607087es_ES
dc.identifier.urihttp://hdl.handle.net/2183/35468
dc.description© 2023 Autores | ACM. This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in International Conference for High Performance Computing, Networking, Storage and Analysis, https://doi.org/10.1145/3581784.3607087es_ES
dc.description.abstract[Abstract]: The increasing success and scaling of Deep Learning models demands higher computational efficiency and power. Sparsification can lead to both smaller models as well as higher compute efficiency, and accelerated hardware is becoming available. However, exploiting it efficiently requires kernel implementations, pruning algorithms, and storage formats, to utilize hardware support of specialized sparse vector units. An example of those are the NVIDIA's Sparse Tensor Cores (SPTCs), which promise a 2× speedup. However, SPTCs only support the 2:4 format, limiting achievable sparsity ratios to 50%. We present the V:N:M format, which enables the execution of arbitrary N:M ratios on SPTCs. To efficiently exploit the resulting format, we propose Spatha, a high-performance sparse-library for DL routines. We show that Spatha achieves up to 37× speedup over cuBLAS. We also demonstrate a second-order pruning technique that enables sparsification to high sparsity ratios with V:N:M and little to no loss in accuracy in modern transformers.es_ES
dc.description.sponsorshipThis research was supported by the Ministry of Science and Innovation of Spain (grants PID2019-104184RB-I00 and PID2022-136435NB-I00, funded by MCIN/AEI/ 10.13039/501100011033, PID2022 also funded by "ERDF A way of making Europe", EU), the Ministry of Education (predoctoral grant of Roberto L. Castro, FPU19/03974), by Xunta de Galicia under the Consolidation Program of Competitive Reference Groups (ED431C 2021/30), and ERC grant PSAP, no. 101002047. We also acknowledge the support from CITIC, funded by Xunta de Galicia and FEDER funds of the EU (Centro de Investigación de Galicia accreditation 2019-2022, ED431G 2019/01). Finally, we thank the Swiss National Supercomputing Center (CSCS) and the Centro de Supercomputación de Galicia (CESGA) for the use of their computers.es_ES
dc.description.sponsorshipXunta de Galicia; ED431C 2021/30es_ES
dc.description.sponsorshipXunta de Galicia; ED431G 2019/01es_ES
dc.language.isoenges_ES
dc.publisherAssociation for Computing Machineryes_ES
dc.relationinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-104184RB-I00/ES/DESAFIOS ACTUALES EN HPC: ARQUITECTURAS, SOFTWARE Y APLICACIONES/es_ES
dc.relationinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-136435NB-I00/ES/RESULTADOS DE INVESTIGACIÓN PROYECTOS ARQUITECTURAS, FRAMEWORKS Y APLICACIONES DE LA COMPUTACION DE ALTAS PRESTACIONESes_ES
dc.relationinfo:eu-repo/grantAgreement/MECD/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/FPU19%2F03974/ES/es_ES
dc.relationinfo:eu-repo/grantAgreement/EC/H2020/101002047es_ES
dc.relation.isversionofhttps://doi.org/10.1145/3581784.3607087
dc.relation.urihttps://doi.org/10.1145/3581784.3607087es_ES
dc.rights© 2023 Autores | ACM.es_ES
dc.subjectSparse Tensor Coreses_ES
dc.subjectGPUes_ES
dc.subjectPruninges_ES
dc.subjectSparsificationes_ES
dc.subjectCUDAes_ES
dc.titleVENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Coreses_ES
dc.typeinfo:eu-repo/semantics/conferenceObjectes_ES
dc.typeinfo:eu-repo/semantics/conferenceObjectes_ES
dc.rights.accessinfo:eu-repo/semantics/openAccesses_ES
UDC.issue72es_ES
UDC.startPage1es_ES
UDC.endPage14es_ES
dc.identifier.doi10.1145/3581784.3607087
UDC.conferenceTitleInternational Conference for High Performance Computing, Networking, Storage and Analysises_ES


Ficheiros no ítem

Thumbnail

Este ítem aparece na(s) seguinte(s) colección(s)

Mostrar o rexistro simple do ítem