Use this link to cite:
https://hdl.handle.net/2183/46356 Evaluación de técnicas de preentrenamiento de modelos fundacionales de tráfico de red basados en el Transformer
Loading...
Identifiers
Publication date
Authors
Vila Riveira, Sergio
Advisors
Other responsabilities
Universidade da Coruña. Facultade de Informática
Journal Title
Bibliographic citation
Type of academic work
Academic degree
Abstract
[Resumen]: El desarrollo de modelos fundacionales de tráfico de red busca resolver el problema de la falta de generalización de los modelos de inteligencia artificial en este ámbito. Estos modelos se entrenan inicialmente con grandes volúmenes de datos no etiquetados y, posteriormente, se adaptan con pequeñas cantidades de datos etiquetados mediante fine-tuning. Inspirados en los avances del procesamiento del lenguaje natural, la arquitectura Transformer se ha consolidado como referencia. La fase de preentrenamiento resulta crítica, ya que define la capacidad del modelo para aprender representaciones generales que capturen patrones estadísticos y semánticos. Aunque en la literatura se han propuesto diversas técnicas, no existe consenso ni estudios comparativos que evalúen su impacto en la generalización de los modelos fundacionales de red. En este proyecto se aborda esta carencia mediante un estudio comparativo de distintas técnicas de preentrenamiento aplicadas a modelos basados en Transformers. Para garantizar la rigurosidad del proceso se ha seguido la metodología CRISP-DM. El objetivo es analizar en qué medida estas técnicas influyen en la capacidad de los modelos para transferir conocimiento a tareas específicas. Los resultados muestran que, si bien todas favorecen cierto grado de transferencia, la formulación concreta de cada técnica determina de manera significativa la utilidad de las representaciones aprendidas. Este análisis aporta una base empírica sólida para orientar el diseño de futuros modelos fundacionales, subrayando la relevancia de una elección adecuada de la estrategia de preentrenamiento. Con el fin de favorecer la reproducibilidad, el código y los modelos preentrenados han sido publicados como software libre.
[Abstract]: The development of network traffic foundation models seeks to address the challenge of limited generalization in artificial intelligence within this domain. These models are initially pretrained on large volumes of unlabelled data and subsequently adapted to specific tasks with small amounts of labelled data through fine-tuning. Inspired by advances in natural language processing, the Transformer architecture has become the prevailing choice. The pretraining stage is critical, as it defines the model’s ability to learn general representations that capture statistical and semantic patterns. Although several techniques have been proposed in the literature, no consensus or comparative studies exist regarding their impact on the generalization capabilities of network foundation models. This project addresses this gap by conducting a comparative study of different pretraining techniques applied to Transformer-based models. To ensure methodological rigor, the CRISP-DM framework was followed. The main objective is to analyse how these techniques influence the models’ ability to transfer knowledge to specific tasks. Results show that while all techniques enable a certain degree of transfer, their specific formulation significantly affects the usefulness of the learned representations. This analysis provides a solid empirical basis for guiding the design of future foundation models, highlighting the importance of selecting an appropriate pretraining strategy. To foster reproducibility, the code and pretrained models have been released as open source software.
[Abstract]: The development of network traffic foundation models seeks to address the challenge of limited generalization in artificial intelligence within this domain. These models are initially pretrained on large volumes of unlabelled data and subsequently adapted to specific tasks with small amounts of labelled data through fine-tuning. Inspired by advances in natural language processing, the Transformer architecture has become the prevailing choice. The pretraining stage is critical, as it defines the model’s ability to learn general representations that capture statistical and semantic patterns. Although several techniques have been proposed in the literature, no consensus or comparative studies exist regarding their impact on the generalization capabilities of network foundation models. This project addresses this gap by conducting a comparative study of different pretraining techniques applied to Transformer-based models. To ensure methodological rigor, the CRISP-DM framework was followed. The main objective is to analyse how these techniques influence the models’ ability to transfer knowledge to specific tasks. Results show that while all techniques enable a certain degree of transfer, their specific formulation significantly affects the usefulness of the learned representations. This analysis provides a solid empirical basis for guiding the design of future foundation models, highlighting the importance of selecting an appropriate pretraining strategy. To foster reproducibility, the code and pretrained models have been released as open source software.
Description
Editor version
Rights
Attribution 4.0 International







