A Hybrid Metaheuristics-Bayesian Optimization Framework with Safe Transfer Learning for Continuous Spark Tuning
| UDC.coleccion | Investigación | |
| UDC.departamento | Ciencias da Computación e Tecnoloxías da Información | |
| UDC.grupoInv | Laboratorio de Investigación e Desenvolvemento en Intelixencia Artificial (LIDIA) | |
| UDC.institutoCentro | CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación | |
| UDC.issue | 108325 | |
| UDC.journalTitle | Future Generation Computer Systems | |
| UDC.volume | 178 | |
| dc.contributor.author | Garralda-Barrio, Mariano | |
| dc.contributor.author | Eiras-Franco, Carlos | |
| dc.contributor.author | Bolón-Canedo, Verónica | |
| dc.date.accessioned | 2026-03-05T10:01:52Z | |
| dc.date.available | 2026-03-05T10:01:52Z | |
| dc.date.issued | 2025-12 | |
| dc.description | The source code and experimental resources supporting this study are openly available at Github (https://github.com/mgarralda/garralda-performance-model). They are released for academic and research use under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0). | |
| dc.description.abstract | [Abstract]: Tuning configuration parameters in distributed Big Data engines such as Apache Spark is a high-dimensional, workload-dependent problem with significant impact on performance and operational cost. We address this challenge with a hybrid optimization framework that integrates Iterated Local Search, Tabu Search, and locally embedded Bayesian Optimization guided by STL-PARN (safe transfer learning with pattern-adaptive robust neighborhoods). Historical executions are partitioned into a Nucleus of reliable neighbors and a Corona of exploratory configurations, ensuring relevance while mitigating negative transfer. The surrogate within the embedded Bayesian Optimization stage decouples performance prediction from uncertainty modeling, enabling parameter-free acquisition functions that self-adapt to diverse workloads. Experiments on a modernized HiBench suite across multiple input scales show consistent gains over state-of-the-art baselines in execution time, convergence, and cost efficiency. Overall, the results demonstrate the robustness and practical value of embedding Bayesian Optimization within a global metaheuristic loop for adaptive, cost-aware Spark tuning. All source code and datasets are publicly available, supporting reproducibility and operational efficiency in large-scale data processing. | |
| dc.description.sponsorship | The authors extend their gratitude to Laboratorio Innovación Aplicada (LIA) at Minsait (Indra Company) for their support in this study. This work has been supported by Ministerio de Ciencia e Innovación MCIN/AEI/10.13039/501100011033/ FEDER, UE under grant PID2023-147404OB-I00, and by the Ministry for Digital Transformation and Civil Service and ‘Next-GenerationEU’ /PRTR under Grant TSI-100925-2023-1. CITIC, as a center accredited for excellence within the Galician University System and a member of the CIGUS Network, receives subsidies from the Department of Education, Science, Universities, and Vocational Training of the Xunta de Galicia. Additionally, it is co-financed by the EU through the FEDER Galicia 2021-27 operational program (Ref. ED431G 2023/01). Grant ED431C 2022/44 funded by Xunta de Galicia. | |
| dc.description.sponsorship | Xunta de Galicia; ED431G 2023/01 | |
| dc.description.sponsorship | Xunta de Galicia; ED431C 2022/44 | |
| dc.identifier.citation | M. Garralda-Barrio, C. Eiras-Franco, and V. Bolón-Canedo, "A hybrid metaheuristics-Bayesian optimization framework with safe transfer learning for continuous spark tuning", Future Generation Computer Systems, Vol. 178, May 2026, 108325, https://doi.org/10.1016/j.future.2025.108325 | |
| dc.identifier.doi | 10.1016/j.future.2025.108325 | |
| dc.identifier.issn | 1872-7115 | |
| dc.identifier.uri | https://hdl.handle.net/2183/47590 | |
| dc.language.iso | eng | |
| dc.publisher | Elsevier | |
| dc.relation.isbasedon | https://github.com/mgarralda/garralda-performance-model | |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2023-147404OB-I00/ES/APRENDIZAJE AUTOMATICO FRUGAL: POTENCIANDO LA IA EN ENTORNOS CON RECURSOS LIMITADOS PARA LOS DESAFIOS DEL MUNDO REAL | |
| dc.relation.projectID | info:eu-repo/grantAgreement/MTDPF//TSI-100925-2023-1/ES/CÁTEDRA UDC-INDITEX DE IA EN ALGORITMOS VERDES | |
| dc.relation.uri | https://doi.org/10.1016/j.future.2025.108325 | |
| dc.rights | Attribution-NonCommercial 4.0 International | en |
| dc.rights.accessRights | open access | |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | |
| dc.subject | Spark continuous tuning | |
| dc.subject | Bayesian optimization | |
| dc.subject | Safe transfer learning | |
| dc.subject | Metaheuristics | |
| dc.subject | Big data | |
| dc.title | A Hybrid Metaheuristics-Bayesian Optimization Framework with Safe Transfer Learning for Continuous Spark Tuning | |
| dc.type | journal article | |
| dc.type.hasVersion | VoR | |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | ca60a4d3-b38f-4d91-bfa6-f855a8e171ab | |
| relation.isAuthorOfPublication | c114dccd-76e4-4959-ba6b-7c7c055289b1 | |
| relation.isAuthorOfPublication.latestForDiscovery | ca60a4d3-b38f-4d91-bfa6-f855a8e171ab |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- EirasFranco_Carlos_2026_A_hybrid_metaheuristics.pdf
- Size:
- 8.28 MB
- Format:
- Adobe Portable Document Format

