Toward Human-Centered Explainability: Natural Language Explanations for Anomaly Detection

Padín Torrente, Héctor; Carneiro, Víctor; Ortega-Fernández, Inés

Use this link to cite:

https://hdl.handle.net/2183/48029

Toward Human-Centered Explainability: Natural Language Explanations for Anomaly Detection

Files

CarneiroDiaz_Victor_2026_Toward_Human_Centered_Explainability.pdf (1.94 MB)

Identifiers

URI: https://hdl.handle.net/2183/48029

DOI: 10.1007/s10796-026-10717-3

Publication date

2026-04-10

Authors

Padín Torrente, Héctor

Carneiro, Víctor

Ortega-Fernández, Inés

Bibliographic citation

Padín-Torrente, H., Carneiro-Diaz, V. & Ortega-Fernandez, I. Toward Human-Centered Explainability: Natural Language Explanations for Anomaly Detection. Inf Syst Front (2026). https://doi.org/10.1007/s10796-026-10717-3

Abstract

[Abstract]: This paper proposes a human-centered explainable artificial intelligence pipeline for anomaly detection, designed to generate meaningful, context-aware explanations using local large language models. The proposed pipeline translates model outputs and SHAP-based feature attributions into natural language explanations for cybersecurity alerts generated by an autoencoder within an enterprise network. It incorporates a human-in-the-loop component to ground the explanations in validated expert knowledge, enhancing their interpretability and alignment with human decision-making processes. Using a rubric-driven LLM-as-a-Judge evaluation, we benchmark several large language models and show that as smaller models receive more contextual grounding through human-in-the-loop, their explanatory performance improves significantly, narrowing the gap with larger models while maintaining substantially lower computational demands. Our approach provides targeted, context-aware explanations designed to meet the cognitive and operational needs of security analysts, contributing to more ethical, trustworthy, and resource-efficient AI integration in critical cybersecurity environments.

Description

The benchmark dataset analyzed in this study is the CSE–CIC-IDS2018 maintained by the Canadian Institute for Cybersecurity, University of New Brunswick, openly accessible at https://www.unb.ca/cic/datasets/ids-2018.html. The evaluation data used to assess the models and the various data generation strategies consist of the predictions produced by the autoencoder in combination with SHAP outputs, applied to the attack instances within the dataset. This evaluation data, as well as prompt details, are available at GitHub (https://github.com/Gradiant/Natural-Language-Explanations-4-AD).