Adaptive incremental transfer learning for efficient performance modeling of big data workloads

Non accesible ata 2027-05-01
Use este enlace para citar
http://hdl.handle.net/2183/40974Coleccións
- Investigación (FIC) [1634]
Metadatos
Mostrar o rexistro completo do ítemTítulo
Adaptive incremental transfer learning for efficient performance modeling of big data workloadsData
2025-05Cita bibliográfica
M. Garralda-Barrio, C. Eiras-Francoa, and V. Bolón-Canedo, "Adaptive incremental transfer learning for efficient performance modeling of big data workloads", Future Generation Computer Systems, Vol. 166, May 2025, 107730, doi: 10.1016/j.future.2025.107730
Resumo
[Abstract]: The rise of data-intensive scalable computing systems, such as Apache Spark, has transformed data processing by enabling the efficient manipulation of large datasets across machine clusters. However, system configuration to optimize performance remains a challenge. This paper introduces an adaptive incremental transfer learning approach to predicting workload execution times. By integrating both unsupervised and supervised learning, we develop models that adapt incrementally to new workloads and configurations. To guide the optimal selection of relevant workloads, the model employs the coefficient of distance variation (CdV) and the coefficient of quality correlation (CqC), combined in the exploration–exploitation balance coefficient (EEBC). Comprehensive evaluations demonstrate the robustness and reliability of our model for performance modeling in Spark applications, with average improvements of up to 31% over state-of-the-art methods. This research contributes to efficient performance tuning systems by enabling transfer learning from historical workloads to new, previously unseen workloads. The full source code is openly available.
Palabras chave
Performance modeling
Big data
Machine learning
Apache spark
Distributed computing
Big data
Machine learning
Apache spark
Distributed computing
Descrición
This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/bync-nd/4.0/. This version of the article has been accepted for publication in Future Generation Computer Systems. The Version of Record is available online at https://doi.org/10.1016/j.future.2025.107730 The algorithms, evaluation metrics, cross-validation methods, and resources used in this study are openly available at https://github.com/mgarralda/garralda-performance-model. Further details can be obtained from the corresponding author on reasonable request.
Versión do editor
Dereitos
© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.