Adapt-S: Effective DNN Pruning via Unified Accuracy and Performance Tuning

Castro, Roberto L.; Andrade, Diego; Fraguela, Basilio B.

Adapt-S: Effective DNN Pruning via Unified Accuracy and Performance Tuning

UDC.coleccion	Investigación
UDC.conferenceTitle	IPDPS 2025
UDC.departamento	Enxeñaría de Computadores
UDC.grupoInv	Grupo de Arquitectura de Computadores (GAC)
UDC.institutoCentro	CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación
dc.contributor.author	Castro, Roberto L.
dc.contributor.author	Andrade, Diego
dc.contributor.author	Fraguela, Basilio B.
dc.date.accessioned	2025-10-20T12:56:15Z
dc.date.available	2025-10-20T12:56:15Z
dc.date.issued	2025-07
dc.description	Traballo presentado no: 2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 03-07 June 2025, Milán, Italia © 2025 IEEE. This version of the paper has been accepted for publication. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The final published paper is available online at: https://doi.org/10.1109/10.1109/IPDPS64566.2025.00018
dc.description.abstract	[Abstract]: Model sparsification has emerged as a promising approach to reducing model size with minimum impact on accuracy. This is achieved through the removal of some model parameters, a process also known as Deep Neural Network (DNN) pruning. The irregular nature of the generated sparse tensors poses a great challenge in the development of efficient GPU kernels optimized for these workloads. This challenge has been recently addressed through the use of hardware-aware semistructured sparsification methods designed to conform to specialized sparse formats and codesigned with template-based kernel implementations. These methods are commonly based on grouping the non-pruned values in blocks of a given size to generate regularity or on generating patterns that fit specialized hardware units. This pruning pattern-format-kernel triplet presents a high degree of tunability, both at the pruning and kernel sides, which can be used to fit certain accuracy-to-performance tradeoffs. On the pruning side, using larger blocks of consecutive non-pruned values favors performance over accuracy, as the weight selection for removal policy becomes less flexible. On the kernel side, recent studies have proven that the tuning of the configuration parameters of template-based multi-level tiling kernel implementations can yield an extra performance boost. This paper presents AdAPT-S, an autotuning system that generates DNN pruning recipes and optimized kernel configurations to fit an accuracy-to-performance specification. This is done through a cost model that integrates both aspects. AdAPT-S gets extra benefits from the exploitation of layer sensitivity by providing per-layer pruning recipes and kernel configurations. The results show that our approach can achieve superior accuracy-to-performance trade-offs and that this can be used to produce models that fit the user requirements.
dc.description.sponsorship	This research was supported by grants PID2019-104184RBI00 and PID2022-136435NB-I00, funded by MCIN/AEI/10.13039/501100011033, PID2022 also funded by “ERDF Away of making Europe”, EU, the predoctoral grant of Roberto L. Castro (FPU19/03974), and by the Xunta de Galicia cofounded by the European Regional Development Fund (ERDF) under the Consolidation Programme of Competitive Reference Groups (ED431C 2021/30). CITIC, as a center accredited for excellence within the Galician University System and a member of the CIGUS Network, receives subsidies from the Department of Education, Science, Universities, and Vocational Training of the Xunta de Galicia. Additionally, it is co-financed by the EU through the FEDER Galicia 2021-27 operational program (Ref. ED431G 2023/01) Innovation Study ESPLAG has received funding through the Inno4scale project, which is funded by the European High-Performance Computing Joint Undertaking (JU) under Grant Agreement No 101118139. The JU receives support from the EuropeanUnion’s Horizon Europe Programme.
dc.description.sponsorship	Xunta der Galicia; ED431C 2021/30
dc.description.sponsorship	Xunta de Galicia; ED431G 2023/01
dc.identifier.citation	R. L. Castro, D. Andrade, and B. B. Fraguela, "Adapt-S: Effective DNN Pruning via Unified Accuracy and Performance Tuning", 2025 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 03-07 June 2025, Milán, Italia, doi: 10.1109/IPDPS64566.2025.00018
dc.identifier.doi	10.1109/IPDPS64566.2025.00018
dc.identifier.isbn	979-8-3315-3237-6
dc.identifier.issn	1530-2075
dc.identifier.uri	https://hdl.handle.net/2183/46024
dc.language.iso	eng
dc.publisher	IEEE
dc.relation.projectID	info: eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-104184RB-I00/ES/DESAFÍOS ACTUALES EN HPC: ARQUITECTURAS, SOFTWARE Y APLICACIONES
dc.relation.projectID	info: eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-136435NB-I00/ES/ARQUITECTURAS, FRAMEWORKS Y APLICACIONES DE LA COMPUTACION DE ALTAS PRESTACIONES
dc.relation.projectID	info:eu-repo/grantAgreement/MCIU/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/FPU19%2F03974/ES/
dc.relation.projectID	info:eu-repo/grantAgreement/EC/HE/101118139
dc.relation.uri	https://doi.org/10.1109/IPDPS64566.2025.00018
dc.rights	© 2025, IEEE
dc.rights.accessRights	open access
dc.subject	DNN Pruning
dc.subject	Tensor Core Unit
dc.subject	GPU programming
dc.subject	CUDA
dc.title	Adapt-S: Effective DNN Pruning via Unified Accuracy and Performance Tuning
dc.type	conference output
dspace.entity.type	Publication
relation.isAuthorOfPublication	ba3b1a6d-65dd-4366-a7d4-f6c802c5f07a
relation.isAuthorOfPublication	7f5bae1c-08f6-4204-b22a-fbe20407a6e4
relation.isAuthorOfPublication.latestForDiscovery	ba3b1a6d-65dd-4366-a7d4-f6c802c5f07a

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Andrade_Diego_2025_AdaptS.pdf
Size:: 1.46 MB
Format:: Adobe Portable Document Format

Download

Collections

Investigación (FIC)