WATCHED: A Web AI Agent Tool for Combating Hate speech by Expanding Data
| UDC.coleccion | Investigación | |
| UDC.departamento | Ciencias da Computación e Tecnoloxías da Información | |
| UDC.grupoInv | Information Retrieval Lab (IRlab) | |
| UDC.institutoCentro | CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación | |
| UDC.issue | 102431 | |
| UDC.journalTitle | SoftwareX | |
| UDC.volume | 32 | |
| dc.contributor.author | Piot, Paloma | |
| dc.contributor.author | Sánchez, Diego | |
| dc.contributor.author | Parapar, Javier | |
| dc.date.accessioned | 2026-01-23T12:05:38Z | |
| dc.date.available | 2026-01-23T12:05:38Z | |
| dc.date.issued | 2025-12 | |
| dc.description | Original software publication Permanent link to code/repository used for this code version: https://github.com/ElsevierSoftwareX/SOFTX-D-25-00589 Permanent link to Reproducible Capsule: https://github.com/nulldiego/watched | |
| dc.description.abstract | [Abstract]: Online harms are a growing problem in digital spaces, putting user safety at risk and reducing trust in social media platforms. One of the most persistent forms of harm is hate speech. To address this, we need tools that combine the speed and scale of automated systems with the judgement and insight of human moderators. These tools should not only find harmful content but also explain their decisions clearly, helping to build trust and understanding. In this paper, we present WATCHED a chatbot designed to support content moderators in tackling hate speech. The chatbot is built as an Artificial Intelligence Agent system that uses Large Language Models along with several specialised tools. It compares new posts with real examples of hate speech and neutral content, uses a BERT-based classifier to help flag harmful messages, looks up slang and informal language using sources like Urban Dictionary, generates chain-of-thought reasoning, and checks platform guidelines to explain and support its decisions. This combination allows the chatbot not only to detect hate speech but to explain why content is considered harmful, grounded in both precedent and policy. Experimental results show that our proposed method surpasses existing state-of-the-art methods, reaching a macro F1 score of 0.91. Designed for moderators, safety teams, and researchers, the tool helps reduce online harms by supporting collaboration between AI and human oversight. | |
| dc.description.sponsorship | The authors thank the funding from the Horizon Europe research and innovation programme under the Marie Skłodowska-Curie Grant Agreement No. 101073351. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Executive Agency (REA). Neither the European Union nor the granting authority can be held responsible for them. The authors thank the financial support supplied by the grant PID2022-137061OB-C21 funded by MI-CIU/AEI/10.13039/501100011033 and by “ERDF/EU”. The authors also thank the funding supplied by the Consellería de Cultura, Educación, Formación Profesional e Universidades (accreditations ED431G 2023/01 and ED431C 2025/49) and the European Regional Development Fund, which acknowledges the CITIC, as a centre accredited for excellence within the Galician University System and a member of the CIGUS Network, receives subsidies from the Department of Education, Science, Universities, and Vocational Training of the Xunta de Galicia. Additionally, it is co-financed by the EU through the FEDER Galicia 2021-27 operational programme (Ref. ED431G 2023/01). | |
| dc.description.sponsorship | Xunta de Galicia; ED431G 2023/01 | |
| dc.description.sponsorship | Xunta de Galicia; ED431C 2025/49 | |
| dc.identifier.citation | P. Piot, D. Sánchez, and J. Parapar, "WATCHED: A Web AI Agent Tool for Combating Hate speech by Expanding Data", SoftwareX, Vol. 32, Dec. 2025, 102431, https://doi.org/10.1016/j.softx.2025.102431 | |
| dc.identifier.doi | 10.1016/j.softx.2025.102431 | |
| dc.identifier.issn | 2352-7110 | |
| dc.identifier.uri | https://hdl.handle.net/2183/47077 | |
| dc.language.iso | eng | |
| dc.publisher | Elsevier | |
| dc.relation.projectID | info:eu-repo/grantAgreement/EC/HE/101073351 | |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica, Técnica y de Innovación 2021-2023/PID2022-137061OB-C21/ES/BUSQUEDA, SELECCION Y ORGANIZACION DE CONTENIDOS PARA NECESIDADES DE INFORMACION RELACIONADAS CON LA SALUD - CONSTRUCCION DE RECURSOS Y PERSONALIZACION | |
| dc.relation.uri | https://doi.org/10.1016/j.softx.2025.102431 | |
| dc.rights | Attribution-NonCommercial 4.0 International | en |
| dc.rights.accessRights | open access | |
| dc.rights.uri | http://creativecommons.org/licenses/by-nc/4.0/ | |
| dc.subject | Hate speech | |
| dc.subject | AI agent | |
| dc.subject | RAG | |
| dc.subject | LLMs | |
| dc.title | WATCHED: A Web AI Agent Tool for Combating Hate speech by Expanding Data | |
| dc.type | journal article | |
| dc.type.hasVersion | VoR | |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | 0563c6c3-cd50-4d7d-b11f-127ee297dd6b | |
| relation.isAuthorOfPublication | fef1a9cb-e346-4e53-9811-192e144f09d0 | |
| relation.isAuthorOfPublication.latestForDiscovery | 0563c6c3-cd50-4d7d-b11f-127ee297dd6b |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Parapar_Javier_2025_WATCHED.pdf
- Size:
- 1.71 MB
- Format:
- Adobe Portable Document Format

