WATCHED: A Web AI Agent Tool for Combating Hate speech by Expanding Data

Piot, Paloma; Sánchez, Diego; Parapar, Javier

WATCHED: A Web AI Agent Tool for Combating Hate speech by Expanding Data

UDC.coleccion	Investigación
UDC.departamento	Ciencias da Computación e Tecnoloxías da Información
UDC.grupoInv	Information Retrieval Lab (IRlab)
UDC.institutoCentro	CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación
UDC.issue	102431
UDC.journalTitle	SoftwareX
UDC.volume	32
dc.contributor.author	Piot, Paloma
dc.contributor.author	Sánchez, Diego
dc.contributor.author	Parapar, Javier
dc.date.accessioned	2026-01-23T12:05:38Z
dc.date.available	2026-01-23T12:05:38Z
dc.date.issued	2025-12
dc.description	Original software publication Permanent link to code/repository used for this code version: https://github.com/ElsevierSoftwareX/SOFTX-D-25-00589 Permanent link to Reproducible Capsule: https://github.com/nulldiego/watched
dc.description.abstract	[Abstract]: Online harms are a growing problem in digital spaces, putting user safety at risk and reducing trust in social media platforms. One of the most persistent forms of harm is hate speech. To address this, we need tools that combine the speed and scale of automated systems with the judgement and insight of human moderators. These tools should not only find harmful content but also explain their decisions clearly, helping to build trust and understanding. In this paper, we present WATCHED a chatbot designed to support content moderators in tackling hate speech. The chatbot is built as an Artificial Intelligence Agent system that uses Large Language Models along with several specialised tools. It compares new posts with real examples of hate speech and neutral content, uses a BERT-based classifier to help flag harmful messages, looks up slang and informal language using sources like Urban Dictionary, generates chain-of-thought reasoning, and checks platform guidelines to explain and support its decisions. This combination allows the chatbot not only to detect hate speech but to explain why content is considered harmful, grounded in both precedent and policy. Experimental results show that our proposed method surpasses existing state-of-the-art methods, reaching a macro F1 score of 0.91. Designed for moderators, safety teams, and researchers, the tool helps reduce online harms by supporting collaboration between AI and human oversight.
dc.description.sponsorship	The authors thank the funding from the Horizon Europe research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 101073351. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Executive Agency (REA). Neither the European Union nor the granting authority can be held responsible for them. The authors thank the financial support supplied by the grant PID2022-137061OB-C21 funded by MI-CIU/AEI/10.13039/501100011033 and by “ERDF/EU”. The authors also thank the funding supplied by the Consellería de Cultura, Educación, Formación Profesional e Universidades (accreditations ED431G 2023/01 and ED431C 2025/49) and the European Regional Development Fund, which acknowledges the CITIC, as a centre accredited for excellence within the Galician University System and a member of the CIGUS Network, receives subsidies from the Department of Education, Science, Universities, and Vocational Training of the Xunta de Galicia. Additionally, it is co-financed by the EU through the FEDER Galicia 2021-27 operational programme (Ref. ED431G 2023/01).
dc.description.sponsorship	Xunta de Galicia; ED431G 2023/01
dc.description.sponsorship	Xunta de Galicia; ED431C 2025/49
dc.identifier.citation	P. Piot, D. Sánchez, and J. Parapar, "WATCHED: A Web AI Agent Tool for Combating Hate speech by Expanding Data", SoftwareX, Vol. 32, Dec. 2025, 102431, https://doi.org/10.1016/j.softx.2025.102431
dc.identifier.doi	10.1016/j.softx.2025.102431
dc.identifier.issn	2352-7110
dc.identifier.uri	https://hdl.handle.net/2183/47077
dc.language.iso	eng
dc.publisher	Elsevier
dc.relation.projectID	info:eu-repo/grantAgreement/EC/HE/101073351
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica, Técnica y de Innovación 2021-2023/PID2022-137061OB-C21/ES/BUSQUEDA, SELECCION Y ORGANIZACION DE CONTENIDOS PARA NECESIDADES DE INFORMACION RELACIONADAS CON LA SALUD - CONSTRUCCION DE RECURSOS Y PERSONALIZACION
dc.relation.uri	https://doi.org/10.1016/j.softx.2025.102431
dc.rights	Attribution-NonCommercial 4.0 International	en
dc.rights.accessRights	open access
dc.rights.uri	http://creativecommons.org/licenses/by-nc/4.0/
dc.subject	Hate speech
dc.subject	AI agent
dc.subject	RAG
dc.subject	LLMs
dc.title	WATCHED: A Web AI Agent Tool for Combating Hate speech by Expanding Data
dc.type	journal article
dc.type.hasVersion	VoR
dspace.entity.type	Publication
relation.isAuthorOfPublication	0563c6c3-cd50-4d7d-b11f-127ee297dd6b
relation.isAuthorOfPublication	fef1a9cb-e346-4e53-9811-192e144f09d0
relation.isAuthorOfPublication.latestForDiscovery	0563c6c3-cd50-4d7d-b11f-127ee297dd6b

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Parapar_Javier_2025_WATCHED.pdf
Size:: 1.71 MB
Format:: Adobe Portable Document Format

Download

Collections

Investigación (FIC)