Document-level event extraction from Italian crime news using minimal data

UDC.coleccionInvestigaciónes_ES
UDC.departamentoCiencias da Computación e Tecnoloxías da Informaciónes_ES
UDC.grupoInvLingua e Sociedade da Información (LYS)es_ES
UDC.institutoCentroCITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicaciónes_ES
UDC.journalTitleKnowledge-Based Systemses_ES
UDC.startPage113386es_ES
UDC.volume317es_ES
dc.contributor.authorBonisoli, Giovanni
dc.contributor.authorVilares, David
dc.contributor.authorRollo, Federica
dc.contributor.authorPo, Laura
dc.date.accessioned2025-05-13T09:43:43Z
dc.date.available2025-05-13T09:43:43Z
dc.date.issued2025-05-23
dc.descriptionFinanciado para publicación en acceso aberto: Universidade da Coruña/CISUGes_ES
dc.description.abstract[Abstract]: Event extraction from unstructured text is a critical task in natural language processing, often requiring substantial annotated data. This study presents an approach to document-level event extraction applied to Italian crime news, utilizing large language models (LLMs) with minimal labeled data. Our method leverages zero-shot prompting and in-context learning to effectively extract relevant event information. We address three key challenges: (1) identifying text spans corresponding to event entities, (2) associating related spans dispersed throughout the text with the same entity, and (3) formatting the extracted data into a structured JSON. The findings are promising: LLMs achieve an F1-score of approximately 60% for detecting event-related text spans, demonstrating their potential even in resource-constrained settings. This work represents a significant advancement in utilizing LLMs for tasks traditionally dependent on extensive data, showing that meaningful results are achievable with minimal data annotation. Additionally, the proposed approach outperforms several baselines, confirming its robustness and adaptability to various event extraction scenarioses_ES
dc.description.sponsorshipThis work has received support by Grant GAP (PID2022-139308OA-I00) funded by MCIN/AEI/10.13039/501100011033/ and by ERDF, EU; by Grant SCANNER-UDC (PID2020-113230RB-C21) funded by MICIU/AEI/10.13039/501100011033/; by Xunta de Galicia (ED431C 2024/02); and by Centro de Investigación de Galicia “CITIC”, funded by the Xunta de Galicia through the collaboration agreement between the Consellería de Cultura, Educación, Formación Profesional e Universidades and the Galician universities for the reinforcement of the research centers of the Galician University System (CIGUS); by Mobility of higher education students and staff supported by internal policy funds (2022-1-IT02-KA131-HED-000064316) funded by EU through the Erasmus+ Programme and by UNIMORE; by Ministry for Digital Transformation and Civil Service and ‘Next-GenerationEU’/PRTR under Grant TSI-100925-2023-1.es_ES
dc.description.sponsorshipXunta de Galicia; ED431C 2024/02es_ES
dc.identifier.citationG. Bonisoli, D. Vilares, F. Rollo, y L. Po, «Document-level event extraction from Italian crime news using minimal data», Knowledge-Based Systems, vol. 317, p. 113386, may 2025, doi: 10.1016/j.knosys.2025.113386es_ES
dc.identifier.issn1872-7409
dc.identifier.issn0950-7051
dc.identifier.urihttp://hdl.handle.net/2183/41978
dc.language.isoenges_ES
dc.publisherElsevieres_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-139308OA-100/ES/REPRESENTACIONES ESTRUCTURADAS VERDES Y ENCHUFABLESes_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113230RB-C21/ES/MODELOS MULTITAREA DE ETIQUETADO SECUENCIAL PARA EL RECONOCIMIENTO DE ENTIDADES ENRIQUECIDO CON INFORMACIÓN LINGÜÍSTICA: SINTAXIS E INTEGRACIÓN MULTITAREA (SCANNER-UDC)es_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113230RB-C21/ES/MODELOS MULTITAREA DE ETIQUETADO SECUENCIAL PARA EL RECONOCIMIENTO DE ENTIDADES ENRIQUECIDO CON INFORMACIÓN LINGÜÍSTICA: SINTAXIS E INTEGRACIÓN MULTITAREA (SCANNER-UDC)es_ES
dc.relation.urihttps://doi.org/10.1016/j.knosys.2025.113386es_ES
dc.rightsAtribución 3.0 Españaes_ES
dc.rights© 2025 The Authors. Published by Elsevier B.V.es_ES
dc.rights.accessRightsopen accesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.subjectEvent extractiones_ES
dc.subjectLarge language modelses_ES
dc.subjectIn-context promptinges_ES
dc.subjectFew-shot learninges_ES
dc.subjectPrompt tuninges_ES
dc.subjectCrime newses_ES
dc.subjectInformation extractiones_ES
dc.titleDocument-level event extraction from Italian crime news using minimal dataes_ES
dc.typejournal articlees_ES
dc.type.hasVersionVoRes_ES
dspace.entity.typePublication
relation.isAuthorOfPublication37dabbe9-f54f-43bb-960e-0bf3ac7e54eb
relation.isAuthorOfPublication.latestForDiscovery37dabbe9-f54f-43bb-960e-0bf3ac7e54eb

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Bonisoli_Giovanni_2025_Document-level_event_extraction.pdf
Size:
3.49 MB
Format:
Adobe Portable Document Format
Description: