Document-level event extraction from Italian crime news using minimal data
| UDC.coleccion | Investigación | es_ES |
| UDC.departamento | Ciencias da Computación e Tecnoloxías da Información | es_ES |
| UDC.grupoInv | Lingua e Sociedade da Información (LYS) | es_ES |
| UDC.institutoCentro | CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación | es_ES |
| UDC.journalTitle | Knowledge-Based Systems | es_ES |
| UDC.startPage | 113386 | es_ES |
| UDC.volume | 317 | es_ES |
| dc.contributor.author | Bonisoli, Giovanni | |
| dc.contributor.author | Vilares, David | |
| dc.contributor.author | Rollo, Federica | |
| dc.contributor.author | Po, Laura | |
| dc.date.accessioned | 2025-05-13T09:43:43Z | |
| dc.date.available | 2025-05-13T09:43:43Z | |
| dc.date.issued | 2025-05-23 | |
| dc.description | Financiado para publicación en acceso aberto: Universidade da Coruña/CISUG | es_ES |
| dc.description.abstract | [Abstract]: Event extraction from unstructured text is a critical task in natural language processing, often requiring substantial annotated data. This study presents an approach to document-level event extraction applied to Italian crime news, utilizing large language models (LLMs) with minimal labeled data. Our method leverages zero-shot prompting and in-context learning to effectively extract relevant event information. We address three key challenges: (1) identifying text spans corresponding to event entities, (2) associating related spans dispersed throughout the text with the same entity, and (3) formatting the extracted data into a structured JSON. The findings are promising: LLMs achieve an F1-score of approximately 60% for detecting event-related text spans, demonstrating their potential even in resource-constrained settings. This work represents a significant advancement in utilizing LLMs for tasks traditionally dependent on extensive data, showing that meaningful results are achievable with minimal data annotation. Additionally, the proposed approach outperforms several baselines, confirming its robustness and adaptability to various event extraction scenarios | es_ES |
| dc.description.sponsorship | This work has received support by Grant GAP (PID2022-139308OA-I00) funded by MCIN/AEI/10.13039/501100011033/ and by ERDF, EU; by Grant SCANNER-UDC (PID2020-113230RB-C21) funded by MICIU/AEI/10.13039/501100011033/; by Xunta de Galicia (ED431C 2024/02); and by Centro de Investigación de Galicia “CITIC”, funded by the Xunta de Galicia through the collaboration agreement between the Consellería de Cultura, Educación, Formación Profesional e Universidades and the Galician universities for the reinforcement of the research centers of the Galician University System (CIGUS); by Mobility of higher education students and staff supported by internal policy funds (2022-1-IT02-KA131-HED-000064316) funded by EU through the Erasmus+ Programme and by UNIMORE; by Ministry for Digital Transformation and Civil Service and ‘Next-GenerationEU’/PRTR under Grant TSI-100925-2023-1. | es_ES |
| dc.description.sponsorship | Xunta de Galicia; ED431C 2024/02 | es_ES |
| dc.identifier.citation | G. Bonisoli, D. Vilares, F. Rollo, y L. Po, «Document-level event extraction from Italian crime news using minimal data», Knowledge-Based Systems, vol. 317, p. 113386, may 2025, doi: 10.1016/j.knosys.2025.113386 | es_ES |
| dc.identifier.issn | 1872-7409 | |
| dc.identifier.issn | 0950-7051 | |
| dc.identifier.uri | http://hdl.handle.net/2183/41978 | |
| dc.language.iso | eng | es_ES |
| dc.publisher | Elsevier | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-139308OA-100/ES/REPRESENTACIONES ESTRUCTURADAS VERDES Y ENCHUFABLES | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113230RB-C21/ES/MODELOS MULTITAREA DE ETIQUETADO SECUENCIAL PARA EL RECONOCIMIENTO DE ENTIDADES ENRIQUECIDO CON INFORMACIÓN LINGÜÍSTICA: SINTAXIS E INTEGRACIÓN MULTITAREA (SCANNER-UDC) | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113230RB-C21/ES/MODELOS MULTITAREA DE ETIQUETADO SECUENCIAL PARA EL RECONOCIMIENTO DE ENTIDADES ENRIQUECIDO CON INFORMACIÓN LINGÜÍSTICA: SINTAXIS E INTEGRACIÓN MULTITAREA (SCANNER-UDC) | es_ES |
| dc.relation.uri | https://doi.org/10.1016/j.knosys.2025.113386 | es_ES |
| dc.rights | Atribución 3.0 España | es_ES |
| dc.rights | © 2025 The Authors. Published by Elsevier B.V. | es_ES |
| dc.rights.accessRights | open access | es_ES |
| dc.rights.uri | http://creativecommons.org/licenses/by/3.0/es/ | * |
| dc.subject | Event extraction | es_ES |
| dc.subject | Large language models | es_ES |
| dc.subject | In-context prompting | es_ES |
| dc.subject | Few-shot learning | es_ES |
| dc.subject | Prompt tuning | es_ES |
| dc.subject | Crime news | es_ES |
| dc.subject | Information extraction | es_ES |
| dc.title | Document-level event extraction from Italian crime news using minimal data | es_ES |
| dc.type | journal article | es_ES |
| dc.type.hasVersion | VoR | es_ES |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | 37dabbe9-f54f-43bb-960e-0bf3ac7e54eb | |
| relation.isAuthorOfPublication.latestForDiscovery | 37dabbe9-f54f-43bb-960e-0bf3ac7e54eb |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Bonisoli_Giovanni_2025_Document-level_event_extraction.pdf
- Size:
- 3.49 MB
- Format:
- Adobe Portable Document Format
- Description:

