Semantic Hierarchical Classification Applied to Anomaly Detection Using System Logs with a BERT Model
Use este enlace para citar
http://hdl.handle.net/2183/38332
Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional (CC-BY 4.0)
Colecciones
Metadatos
Mostrar el registro completo del ítemTítulo
Semantic Hierarchical Classification Applied to Anomaly Detection Using System Logs with a BERT ModelFecha
2024-06Cita bibliográfica
Corbelle, C.; Carneiro, V.; Cacheda, F. Semantic Hierarchical Classification Applied to Anomaly Detection Using System Logs with a BERT Model. Appl. Sci. 2024, 14, 5388. https://doi.org/10.3390/app14135388
Resumen
[Abstract]: The compaction and structuring of system logs facilitate and expedite anomaly and cyberattack detection processes using machine-learning techniques, while simultaneously reducing alert fatigue caused by false positives. In this work, we implemented an innovative algorithm that employs hierarchical codes based on the semantics of natural language, enabling the generation of a significantly reduced log that preserves the semantics of the original. This method uses codes that reflect the specificity of the topic and its position within a higher hierarchical structure. By applying this catalog to the analysis of logs from the Hadoop Distributed File System (HDFS), we achieved a concise summary with non-repetitive themes, significantly speeding up log analysis and resulting in a substantial reduction in log size while maintaining high semantic similarity. The resulting log has been validated for anomaly detection using the “bert-base-uncased” model and compared with six other methods: PCA, IM, LogCluster, SVM, DeepLog, and LogRobust. The reduced log achieved very similar values in precision, recall, and F1-score metrics, but drastically reduced processing time.
Palabras clave
System logs
Anomaly detection
BERT model
Hierarchical codes
Semantic similarity
Anomaly detection
BERT model
Hierarchical codes
Semantic similarity
Versión del editor
Derechos
Atribución 4.0 Internacional (CC-BY 4.0)
ISSN
2076-3417