LyS A Coruña at GUA-SPA@IberLEF2023. Multi-Task Learning with Large Language Model Encoders for Guarani-Spanish Code Switching Analysis

UDC.coleccionInvestigaciónes_ES
UDC.conferenceTitleIberLEF 2023es_ES
UDC.departamentoLetrases_ES
UDC.grupoInvLingua e Sociedade da Información (LYS)es_ES
dc.contributor.authorMuñoz-Ortiz, Alberto
dc.contributor.authorVilares, David
dc.date.accessioned2023-09-13T07:34:38Z
dc.date.available2023-09-13T07:34:38Z
dc.date.issued2023
dc.description.abstract[Abstract] This paper introduces the LyS A Coruña proposal for the Guarani-Spanish Code Switching Analysis task at IberLEF2023. The shared task proposes to analyze Guarani-Spanish code-switched texts, focusing on language identification, named entity recognition (NER), and a novel classification task for Spanish spans in a code-switched Guarani-Spanish context. We propose three multi-task learning systems that have common encoders based on two language models and different decoders in a multi-task learning setup. The encoders use the contextual embeddings by: (i) a large language model (LLM) pretrained on bidirectional machine translation on 200 languages (including Spanish and Guarani) from the No Language Left Behind project, and (ii) a BERT-based model pretrained in Spanish and finetuned in around 800k Guarani tokens. The decoders are: (i) a softmax output layer for Task 1, and (ii) conditional random fields (CRF) output layers for Tasks 2 and 3. According to official results, we ranked third in the three tasks.es_ES
dc.description.sponsorshipSCANNER-UDC; PID2020-113230RB-C21 Ministerio de Economía e Industria; PID2020-113230RB-C21 Xunta de Galicia; ED431C 2020/11 CITIC; ED431G 2019/01.es_ES
dc.identifier.citationA. Muñoz Ortiz, D. Vilares. LyS A Coruña at GUA-SPA@IberLEF2023: Multi-Task Learning with Large Language Model Encoders for Guarani-Spanish Code Switching Analysis, in: Proceedings of IberLEF 2023, Jaén, Spaines_ES
dc.identifier.urihttp://hdl.handle.net/2183/33478
dc.language.isoenges_ES
dc.relation.projectIDinfo:eu-repo/grantAgreemet/EC/H2020/101100615es_ES
dc.rightsAtribución 4.0 Internacionales_ES
dc.rights.accessRightsopen accesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.subjectMulti-task learninges_ES
dc.subjectGuaraníes_ES
dc.subjectSpanishes_ES
dc.subjectCode switchinges_ES
dc.subjectLanguage identificationes_ES
dc.subjectNamed entity recognitiones_ES
dc.subjectCode classificationes_ES
dc.titleLyS A Coruña at GUA-SPA@IberLEF2023. Multi-Task Learning with Large Language Model Encoders for Guarani-Spanish Code Switching Analysises_ES
dc.typeconference outputes_ES
dspace.entity.typePublication
relation.isAuthorOfPublicationedf1cde8-d272-4a73-bdd3-9be2361b7651
relation.isAuthorOfPublication37dabbe9-f54f-43bb-960e-0bf3ac7e54eb
relation.isAuthorOfPublication.latestForDiscoveryedf1cde8-d272-4a73-bdd3-9be2361b7651

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Munoz_Ortiz_2023_Guarani_Spanish_Code_Switching_Analysis.pdf
Size:
711.95 KB
Format:
Adobe Portable Document Format
Description: