Enhancing discourse parsing for local structures from social media with LLM-generated data

UDC.coleccionInvestigaciónes_ES
UDC.conferenceTitleCOLING 2025: 31st International Conference on Computational Linguisticses_ES
UDC.departamentoCiencias da Computación e Tecnoloxías da Informaciónes_ES
UDC.endPage8748es_ES
UDC.grupoInvInformation Retrieval Lab (IRlab)es_ES
UDC.institutoCentroCITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicaciónes_ES
UDC.journalTitleProceedings of the 31st International Conference on Computational Linguisticses_ES
UDC.startPage8739es_ES
dc.contributor.authorPastor, Martial
dc.contributor.authorOostdijk, Nelleke
dc.contributor.authorMartín-Rodilla, Patricia
dc.contributor.authorParapar, Javier
dc.date.accessioned2025-05-21T08:29:42Z
dc.date.available2025-05-21T08:29:42Z
dc.date.issued2025-01
dc.descriptionO congreso tivo lugar en Abu Dhabi, UAE, entre o 19 e o 24 de xaneiro de 2025es_ES
dc.description.abstract[Abstract]: We explore the use of discourse parsers for extracting a particular discourse structure in a real-world social media scenario. Specifically, we focus on enhancing parser performance through the integration of synthetic data generated by large language models (LLMs). We conduct experiments using a newly developed dataset of 1,170 local RST discourse structures, including 900 synthetic and 270 gold examples, covering three social media platforms: online news comments sections, a discussion forum (Reddit), and a social media messaging platform (Twitter). Our primary goal is to assess the impact of LLM-generated synthetic training data on parser performance in a raw text setting without pre-identified discourse units. While both top-down and bottom-up RST architectures greatly benefit from synthetic data, challenges remain in classifying evaluative discourse structureses_ES
dc.description.sponsorshipThis work was produced as part of the HYBRIDS project, a Marie Skłodowoska-Curie Doctoral Network funded by the European Union under grant no. 101073351 and the UK Research and Innovation (UKRI) Horizon Funding Guaranteees_ES
dc.identifier.citationMartial Pastor, Nelleke Oostdijk, Patricia Martin-Rodilla, and Javier Parapar. 2025. Enhancing Discourse Parsing for Local Structures from Social Media with LLM-Generated Data. In Proceedings of the 31st International Conference on Computational Linguistics, pages 8739–8748, Abu Dhabi, UAE. Association for Computational Linguistics.es_ES
dc.identifier.isbn979-889176196-4
dc.identifier.issn2951-2093
dc.identifier.urihttp://hdl.handle.net/2183/42047
dc.language.isoenges_ES
dc.publisherAssociation for Computational Linguistics (ACL)es_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/HE/101073351es_ES
dc.relation.urihttps://aclanthology.org/2025.coling-main.584.pdfes_ES
dc.rightsAtribución 3.0 Españaes_ES
dc.rights©2025 Association for Computational Linguistics. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International Licensees_ES
dc.rights.accessRightsopen accesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.subjectComputational Linguisticses_ES
dc.subjectDiscoursees_ES
dc.subjectParsinges_ES
dc.titleEnhancing discourse parsing for local structures from social media with LLM-generated dataes_ES
dc.typeconference outputes_ES
dc.type.hasVersionVoRes_ES
dspace.entity.typePublication
relation.isAuthorOfPublicationa1440782-cd8e-4634-b8f3-936cc0220cdb
relation.isAuthorOfPublicationfef1a9cb-e346-4e53-9811-192e144f09d0
relation.isAuthorOfPublication.latestForDiscoverya1440782-cd8e-4634-b8f3-936cc0220cdb

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Pastor_Martial_2025_Enhancing_discourse.pdf
Size:
459.06 KB
Format:
Adobe Portable Document Format
Description: