A Hybrid Metaheuristics-Bayesian Optimization Framework with Safe Transfer Learning for Continuous Spark Tuning

UDC.coleccionInvestigación
UDC.departamentoCiencias da Computación e Tecnoloxías da Información
UDC.grupoInvLaboratorio de Investigación e Desenvolvemento en Intelixencia Artificial (LIDIA)
UDC.institutoCentroCITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación
UDC.issue108325
UDC.journalTitleFuture Generation Computer Systems
UDC.volume178
dc.contributor.authorGarralda-Barrio, Mariano
dc.contributor.authorEiras-Franco, Carlos
dc.contributor.authorBolón-Canedo, Verónica
dc.date.accessioned2026-03-05T10:01:52Z
dc.date.available2026-03-05T10:01:52Z
dc.date.issued2025-12
dc.descriptionThe source code and experimental resources supporting this study are openly available at Github (https://github.com/mgarralda/garralda-performance-model). They are released for academic and research use under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
dc.description.abstract[Abstract]: Tuning configuration parameters in distributed Big Data engines such as Apache Spark is a high-dimensional, workload-dependent problem with significant impact on performance and operational cost. We address this challenge with a hybrid optimization framework that integrates Iterated Local Search, Tabu Search, and locally embedded Bayesian Optimization guided by STL-PARN (safe transfer learning with pattern-adaptive robust neighborhoods). Historical executions are partitioned into a Nucleus of reliable neighbors and a Corona of exploratory configurations, ensuring relevance while mitigating negative transfer. The surrogate within the embedded Bayesian Optimization stage decouples performance prediction from uncertainty modeling, enabling parameter-free acquisition functions that self-adapt to diverse workloads. Experiments on a modernized HiBench suite across multiple input scales show consistent gains over state-of-the-art baselines in execution time, convergence, and cost efficiency. Overall, the results demonstrate the robustness and practical value of embedding Bayesian Optimization within a global metaheuristic loop for adaptive, cost-aware Spark tuning. All source code and datasets are publicly available, supporting reproducibility and operational efficiency in large-scale data processing.
dc.description.sponsorshipThe authors extend their gratitude to Laboratorio Innovación Aplicada (LIA) at Minsait (Indra Company) for their support in this study. This work has been supported by Ministerio de Ciencia e Innovación MCIN/AEI/10.13039/501100011033/ FEDER, UE under grant PID2023-147404OB-I00, and by the Ministry for Digital Transformation and Civil Service and ‘Next-GenerationEU’ /PRTR under Grant TSI-100925-2023-1. CITIC, as a center accredited for excellence within the Galician University System and a member of the CIGUS Network, receives subsidies from the Department of Education, Science, Universities, and Vocational Training of the Xunta de Galicia. Additionally, it is co-financed by the EU through the FEDER Galicia 2021-27 operational program (Ref. ED431G 2023/01). Grant ED431C 2022/44 funded by Xunta de Galicia.
dc.description.sponsorshipXunta de Galicia; ED431G 2023/01
dc.description.sponsorshipXunta de Galicia; ED431C 2022/44
dc.identifier.citationM. Garralda-Barrio, C. Eiras-Franco, and V. Bolón-Canedo, "A hybrid metaheuristics-Bayesian optimization framework with safe transfer learning for continuous spark tuning", Future Generation Computer Systems, Vol. 178, May 2026, 108325, https://doi.org/10.1016/j.future.2025.108325
dc.identifier.doi10.1016/j.future.2025.108325
dc.identifier.issn1872-7115
dc.identifier.urihttps://hdl.handle.net/2183/47590
dc.language.isoeng
dc.publisherElsevier
dc.relation.isbasedonhttps://github.com/mgarralda/garralda-performance-model
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2023-147404OB-I00/ES/APRENDIZAJE AUTOMATICO FRUGAL: POTENCIANDO LA IA EN ENTORNOS CON RECURSOS LIMITADOS PARA LOS DESAFIOS DEL MUNDO REAL
dc.relation.projectIDinfo:eu-repo/grantAgreement/MTDPF//TSI-100925-2023-1/ES/CÁTEDRA UDC-INDITEX DE IA EN ALGORITMOS VERDES
dc.relation.urihttps://doi.org/10.1016/j.future.2025.108325
dc.rightsAttribution-NonCommercial 4.0 Internationalen
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subjectSpark continuous tuning
dc.subjectBayesian optimization
dc.subjectSafe transfer learning
dc.subjectMetaheuristics
dc.subjectBig data
dc.titleA Hybrid Metaheuristics-Bayesian Optimization Framework with Safe Transfer Learning for Continuous Spark Tuning
dc.typejournal article
dc.type.hasVersionVoR
dspace.entity.typePublication
relation.isAuthorOfPublicationca60a4d3-b38f-4d91-bfa6-f855a8e171ab
relation.isAuthorOfPublicationc114dccd-76e4-4959-ba6b-7c7c055289b1
relation.isAuthorOfPublication.latestForDiscoveryca60a4d3-b38f-4d91-bfa6-f855a8e171ab

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
EirasFranco_Carlos_2026_A_hybrid_metaheuristics.pdf
Size:
8.28 MB
Format:
Adobe Portable Document Format