WATCHED: A Web AI Agent Tool for Combating Hate speech by Expanding Data

UDC.coleccionInvestigación
UDC.departamentoCiencias da Computación e Tecnoloxías da Información
UDC.grupoInvInformation Retrieval Lab (IRlab)
UDC.institutoCentroCITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación
UDC.issue102431
UDC.journalTitleSoftwareX
UDC.volume32
dc.contributor.authorPiot, Paloma
dc.contributor.authorSánchez, Diego
dc.contributor.authorParapar, Javier
dc.date.accessioned2026-01-23T12:05:38Z
dc.date.available2026-01-23T12:05:38Z
dc.date.issued2025-12
dc.descriptionOriginal software publication Permanent link to code/repository used for this code version: https://github.com/ElsevierSoftwareX/SOFTX-D-25-00589 Permanent link to Reproducible Capsule: https://github.com/nulldiego/watched
dc.description.abstract[Abstract]: Online harms are a growing problem in digital spaces, putting user safety at risk and reducing trust in social media platforms. One of the most persistent forms of harm is hate speech. To address this, we need tools that combine the speed and scale of automated systems with the judgement and insight of human moderators. These tools should not only find harmful content but also explain their decisions clearly, helping to build trust and understanding. In this paper, we present WATCHED a chatbot designed to support content moderators in tackling hate speech. The chatbot is built as an Artificial Intelligence Agent system that uses Large Language Models along with several specialised tools. It compares new posts with real examples of hate speech and neutral content, uses a BERT-based classifier to help flag harmful messages, looks up slang and informal language using sources like Urban Dictionary, generates chain-of-thought reasoning, and checks platform guidelines to explain and support its decisions. This combination allows the chatbot not only to detect hate speech but to explain why content is considered harmful, grounded in both precedent and policy. Experimental results show that our proposed method surpasses existing state-of-the-art methods, reaching a macro F1 score of 0.91. Designed for moderators, safety teams, and researchers, the tool helps reduce online harms by supporting collaboration between AI and human oversight.
dc.description.sponsorshipThe authors thank the funding from the Horizon Europe research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 101073351. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Executive Agency (REA). Neither the European Union nor the granting authority can be held responsible for them. The authors thank the financial support supplied by the grant PID2022-137061OB-C21 funded by MI-CIU/AEI/10.13039/501100011033 and by “ERDF/EU”. The authors also thank the funding supplied by the Consellería de Cultura, Educación, Formación Profesional e Universidades (accreditations ED431G 2023/01 and ED431C 2025/49) and the European Regional Development Fund, which acknowledges the CITIC, as a centre accredited for excellence within the Galician University System and a member of the CIGUS Network, receives subsidies from the Department of Education, Science, Universities, and Vocational Training of the Xunta de Galicia. Additionally, it is co-financed by the EU through the FEDER Galicia 2021-27 operational programme (Ref. ED431G 2023/01).
dc.description.sponsorshipXunta de Galicia; ED431G 2023/01
dc.description.sponsorshipXunta de Galicia; ED431C 2025/49
dc.identifier.citationP. Piot, D. Sánchez, and J. Parapar, "WATCHED: A Web AI Agent Tool for Combating Hate speech by Expanding Data", SoftwareX, Vol. 32, Dec. 2025, 102431, https://doi.org/10.1016/j.softx.2025.102431
dc.identifier.doi10.1016/j.softx.2025.102431
dc.identifier.issn2352-7110
dc.identifier.urihttps://hdl.handle.net/2183/47077
dc.language.isoeng
dc.publisherElsevier
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/HE/101073351
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica, Técnica y de Innovación 2021-2023/PID2022-137061OB-C21/ES/BUSQUEDA, SELECCION Y ORGANIZACION DE CONTENIDOS PARA NECESIDADES DE INFORMACION RELACIONADAS CON LA SALUD - CONSTRUCCION DE RECURSOS Y PERSONALIZACION
dc.relation.urihttps://doi.org/10.1016/j.softx.2025.102431
dc.rightsAttribution-NonCommercial 4.0 Internationalen
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subjectHate speech
dc.subjectAI agent
dc.subjectRAG
dc.subjectLLMs
dc.titleWATCHED: A Web AI Agent Tool for Combating Hate speech by Expanding Data
dc.typejournal article
dc.type.hasVersionVoR
dspace.entity.typePublication
relation.isAuthorOfPublication0563c6c3-cd50-4d7d-b11f-127ee297dd6b
relation.isAuthorOfPublicationfef1a9cb-e346-4e53-9811-192e144f09d0
relation.isAuthorOfPublication.latestForDiscovery0563c6c3-cd50-4d7d-b11f-127ee297dd6b

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Parapar_Javier_2025_WATCHED.pdf
Size:
1.71 MB
Format:
Adobe Portable Document Format