Mostrar el registro sencillo del ítem

dc.contributor.authorMuñoz Ortiz, Alberto
dc.contributor.authorVilares, David
dc.date.accessioned2023-09-13T07:34:38Z
dc.date.available2023-09-13T07:34:38Z
dc.date.issued2023
dc.identifier.citationA. Muñoz Ortiz, D. Vilares. LyS A Coruña at GUA-SPA@IberLEF2023: Multi-Task Learning with Large Language Model Encoders for Guarani-Spanish Code Switching Analysis, in: Proceedings of IberLEF 2023, Jaén, Spaines_ES
dc.identifier.urihttp://hdl.handle.net/2183/33478
dc.description.abstract[Abstract] This paper introduces the LyS A Coruña proposal for the Guarani-Spanish Code Switching Analysis task at IberLEF2023. The shared task proposes to analyze Guarani-Spanish code-switched texts, focusing on language identification, named entity recognition (NER), and a novel classification task for Spanish spans in a code-switched Guarani-Spanish context. We propose three multi-task learning systems that have common encoders based on two language models and different decoders in a multi-task learning setup. The encoders use the contextual embeddings by: (i) a large language model (LLM) pretrained on bidirectional machine translation on 200 languages (including Spanish and Guarani) from the No Language Left Behind project, and (ii) a BERT-based model pretrained in Spanish and finetuned in around 800k Guarani tokens. The decoders are: (i) a softmax output layer for Task 1, and (ii) conditional random fields (CRF) output layers for Tasks 2 and 3. According to official results, we ranked third in the three tasks.es_ES
dc.description.sponsorshipSCANNER-UDC; PID2020-113230RB-C21 Ministerio de Economía e Industria; PID2020-113230RB-C21 Xunta de Galicia; ED431C 2020/11 CITIC; ED431G 2019/01.es_ES
dc.language.isoenges_ES
dc.relationinfo:eu-repo/grantAgreemet/EC/H2020/101100615es_ES
dc.rightsAtribución 4.0 Internacionales_ES
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.subjectMulti-Task Learninges_ES
dc.subjectGuaraníes_ES
dc.subjectSpanishes_ES
dc.subjectCode-switchinges_ES
dc.subjectLanguage identificationes_ES
dc.subjectNamed Entity Recognitiones_ES
dc.subjectCode Classification,es_ES
dc.titleLyS A Coruña at GUA-SPA@IberLEF2023. Multi-Task Learning with Large Language Model Encoders for Guarani-Spanish Code Switching Analysises_ES
dc.typeinfo:eu-repo/semantics/conferenceObjectes_ES
dc.typeinfo:eu-repo/semantics/conferenceObjectes_ES
dc.rights.accessinfo:eu-repo/semantics/openAccesses_ES
UDC.conferenceTitleIberLEF 2023es_ES


Ficheros en el ítem

Thumbnail
Thumbnail

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem