Use this link to cite:
http://hdl.handle.net/2183/32837 Aplicación de técnicas de clustering como paso previo a la detección de anomalías en redes definidas por software
Loading...
Identifiers
Publication date
Authors
Prada Conde, Luis
Other responsabilities
Universidade da Coruña. Facultade de Informática
Journal Title
Bibliographic citation
Type of academic work
Academic degree
Abstract
[Resumen]: La finalidad de este proyecto es la detección de anomalías en el flujo de datos de una red
definida por software. De esta manera, se sigue un procedimiento marcado por la metodología
CRISP-DM.
Inicialmente se dispone de un conjunto de datos no supervisado, que dispone de diferen tes observaciones sobre el tráfico del flujo de la red definida por software, el cual se examina y
analiza. Para ello, se emplea ingeniería de características sobre el conjunto, utilizando determi nadas tecnologías, aplicando una transformación sobre los datos y obteniendo como resultado
un conjunto válido para el análisis en las siguientes fases. Una vez alcanzado esta fase de los
datos, se estudian los posibles algoritmos de aprendizaje máquina a utilizar. Posteriormente,
se busca la mejor combinación de parámetros a aplicar a estos algoritmos, comparándose en tre sí y generando unos modelos lo más óptimos posibles, los cuales son capaces de agrupar
las muestras de datos con características similares y detectar anomalías en el flujo, atendiendo
así a los objetivos establecidos. A través de los modelos realizados, se evalúan los resultados
obtenidos con las puntuaciones de las diversas métricas internas seleccionadas. Finalmente, se
realiza una comparación de los algoritmos utilizados, basándose en los resultados, los tiempos
de ejecución y la facilidad de comprensión, destacando el más óptimo y eficiente.
[Abstract]: The purpose of this project is the detection of data flow anomalies in a software-defined network. In this way, a procedure marked by the CRISP-DM methodology is followed. Initially, an unsupervised data set is available, which has different observations on the traffic flow of the software-defined network, which is examined and analyzed. For this pur pose, feature engineering is used on the set, using certain technologies, applying a transforma tion on the data and obtaining as a result a valid set for analysis in the following phases. Once this phase of the data has been reached, the possible machine learning algorithms to be used are studied. Subsequently, the best combination of parameters to apply to these algorithms is sought, comparing each other and generating the most optimal models possible, which are capable of grouping data samples with similar characteristics and detecting anomalies in the flow, thus meeting to the established objectives. The models are used to evaluate the results obtained with the scores of the various internal metrics selected. Finally, a comparison of the algorithms used is made, based on the results, execution times and ease of understanding, highlighting the most optimal and efficient one.
[Abstract]: The purpose of this project is the detection of data flow anomalies in a software-defined network. In this way, a procedure marked by the CRISP-DM methodology is followed. Initially, an unsupervised data set is available, which has different observations on the traffic flow of the software-defined network, which is examined and analyzed. For this pur pose, feature engineering is used on the set, using certain technologies, applying a transforma tion on the data and obtaining as a result a valid set for analysis in the following phases. Once this phase of the data has been reached, the possible machine learning algorithms to be used are studied. Subsequently, the best combination of parameters to apply to these algorithms is sought, comparing each other and generating the most optimal models possible, which are capable of grouping data samples with similar characteristics and detecting anomalies in the flow, thus meeting to the established objectives. The models are used to evaluate the results obtained with the scores of the various internal metrics selected. Finally, a comparison of the algorithms used is made, based on the results, execution times and ease of understanding, highlighting the most optimal and efficient one.
Description
Editor version
Rights
Atribución-NoComercial 3.0 España








