Adaptive incremental transfer learning for efficient performance modeling of big data workloads

Garralda-Barrio, Mariano; Eiras-Franco, Carlos; Bolón-Canedo, Verónica

Adaptive incremental transfer learning for efficient performance modeling of big data workloads

UDC.coleccion	Investigación	es_ES
UDC.departamento	Ciencias da Computación e Tecnoloxías da Información	es_ES
UDC.grupoInv	Laboratorio de Investigación e Desenvolvemento en Intelixencia Artificial (LIDIA)	es_ES
UDC.institutoCentro	CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación	es_ES
UDC.issue	107730	es_ES
UDC.journalTitle	Future Generation Computer Systems	es_ES
UDC.volume	166	es_ES
dc.contributor.author	Garralda-Barrio, Mariano
dc.contributor.author	Eiras-Franco, Carlos
dc.contributor.author	Bolón-Canedo, Verónica
dc.date.accessioned	2025-01-30T08:46:07Z
dc.date.embargoEndDate	2027-05-01	es_ES
dc.date.embargoLift	2027-05-01
dc.date.issued	2025-05
dc.description	This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/bync-nd/4.0/. This version of the article has been accepted for publication in Future Generation Computer Systems. The Version of Record is available online at https://doi.org/10.1016/j.future.2025.107730	es_ES
dc.description	The algorithms, evaluation metrics, cross-validation methods, and resources used in this study are openly available at https://github.com/mgarralda/garralda-performance-model. Further details can be obtained from the corresponding author on reasonable request.	es_ES
dc.description.abstract	[Abstract]: The rise of data-intensive scalable computing systems, such as Apache Spark, has transformed data processing by enabling the efficient manipulation of large datasets across machine clusters. However, system configuration to optimize performance remains a challenge. This paper introduces an adaptive incremental transfer learning approach to predicting workload execution times. By integrating both unsupervised and supervised learning, we develop models that adapt incrementally to new workloads and configurations. To guide the optimal selection of relevant workloads, the model employs the coefficient of distance variation (CdV) and the coefficient of quality correlation (CqC), combined in the exploration–exploitation balance coefficient (EEBC). Comprehensive evaluations demonstrate the robustness and reliability of our model for performance modeling in Spark applications, with average improvements of up to 31% over state-of-the-art methods. This research contributes to efficient performance tuning systems by enabling transfer learning from historical workloads to new, previously unseen workloads. The full source code is openly available.	es_ES
dc.description.sponsorship	This work has been supported by the National Plan for Scientific and Technical Research and Innovation of the Spanish Government, Spain (Grant PID2019-109238GB-C22 and PID2023-147404OB-I00), and by the Xunta de Galicia (Grant ED431C 2022/44) with the European Union ERDF funds, Spain. CITIC, as Research Center accredited by Galician University System, is funded by “Consellería de Cultura, Educación e Universidade from Xunta de Galicia, Spain ”, supported in an 80% through ERDF Operational Programme Galicia 2014–2020, and the remaining 20% by “Secretaría Xeral de Universidades” (Grant ED431G 2023/01).	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431C 2022/44	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431G 2023/01	es_ES
dc.identifier.citation	M. Garralda-Barrio, C. Eiras-Francoa, and V. Bolón-Canedo, "Adaptive incremental transfer learning for efficient performance modeling of big data workloads", Future Generation Computer Systems, Vol. 166, May 2025, 107730, doi: 10.1016/j.future.2025.107730	es_ES
dc.identifier.doi	10.1016/j.future.2025.107730
dc.identifier.uri	http://hdl.handle.net/2183/40974
dc.language.iso	eng	es_ES
dc.publisher	Elsevier	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2019-109238GB-C22/ES/APRENDIZAJE AUTOMATICO ESCALABLE Y EXPLICABLE	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2023-147404OB-I00/ES/APRENDIZAJE AUTOMATICO FRUGAL: POTENCIANDO LA IA EN ENTORNOS CON RECURSOS LIMITADOS PARA LOS DESAFIOS DEL MUNDO REAL	es_ES
dc.relation.uri	https://doi.org/10.1016/j.future.2025.107730	es_ES
dc.rights	© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.	es_ES
dc.rights.accessRights	embargoed access	es_ES
dc.subject	Performance modeling	es_ES
dc.subject	Big data	es_ES
dc.subject	Machine learning	es_ES
dc.subject	Apache spark	es_ES
dc.subject	Distributed computing	es_ES
dc.title	Adaptive incremental transfer learning for efficient performance modeling of big data workloads	es_ES
dc.type	journal article	es_ES
dspace.entity.type	Publication
relation.isAuthorOfPublication	ca60a4d3-b38f-4d91-bfa6-f855a8e171ab
relation.isAuthorOfPublication	c114dccd-76e4-4959-ba6b-7c7c055289b1
relation.isAuthorOfPublication.latestForDiscovery	ca60a4d3-b38f-4d91-bfa6-f855a8e171ab

Files

Original bundle

Now showing 1 - 1 of 1

Name:: EirasFranco_Carlos_2025_Adaptive_incremental_transfer_learning_for_efficient_performance_modeling_of_big_data_workloads.pdf
Size:: 1.06 MB
Format:: Adobe Portable Document Format
Description:: Versión aceptada

(2027-05-01) Download

Collections

Investigación (FIC)