Toward Human-Centered Explainability: Natural Language Explanations for Anomaly Detection

Loading...
Thumbnail Image

Identifiers

Publication date

Authors

Padín Torrente, Héctor
Ortega-Fernández, Inés

Advisors

Other responsabilities

Journal Title

Bibliographic citation

Padín-Torrente, H., Carneiro-Diaz, V. & Ortega-Fernandez, I. Toward Human-Centered Explainability: Natural Language Explanations for Anomaly Detection. Inf Syst Front (2026). https://doi.org/10.1007/s10796-026-10717-3

Type of academic work

Academic degree

Abstract

[Abstract]: This paper proposes a human-centered explainable artificial intelligence pipeline for anomaly detection, designed to generate meaningful, context-aware explanations using local large language models. The proposed pipeline translates model outputs and SHAP-based feature attributions into natural language explanations for cybersecurity alerts generated by an autoencoder within an enterprise network. It incorporates a human-in-the-loop component to ground the explanations in validated expert knowledge, enhancing their interpretability and alignment with human decision-making processes. Using a rubric-driven LLM-as-a-Judge evaluation, we benchmark several large language models and show that as smaller models receive more contextual grounding through human-in-the-loop, their explanatory performance improves significantly, narrowing the gap with larger models while maintaining substantially lower computational demands. Our approach provides targeted, context-aware explanations designed to meet the cognitive and operational needs of security analysts, contributing to more ethical, trustworthy, and resource-efficient AI integration in critical cybersecurity environments.

Description

The benchmark dataset analyzed in this study is the CSE–CIC-IDS2018 maintained by the Canadian Institute for Cybersecurity, University of New Brunswick, openly accessible at https://www.unb.ca/cic/datasets/ids-2018.html. The evaluation data used to assess the models and the various data generation strategies consist of the predictions produced by the autoencoder in combination with SHAP outputs, applied to the attack instances within the dataset. This evaluation data, as well as prompt details, are available at GitHub (https://github.com/Gradiant/Natural-Language-Explanations-4-AD).

Rights

Attribution 4.0 International
Attribution 4.0 International

Except where otherwise noted, this item's license is described as Attribution 4.0 International