dc.contributor.author | Muñoz Ortiz, Alberto | |
dc.contributor.author | Vilares, David | |
dc.date.accessioned | 2023-09-13T07:34:38Z | |
dc.date.available | 2023-09-13T07:34:38Z | |
dc.date.issued | 2023 | |
dc.identifier.citation | A. Muñoz Ortiz, D. Vilares. LyS A Coruña at GUA-SPA@IberLEF2023: Multi-Task Learning with Large Language Model Encoders for Guarani-Spanish Code Switching Analysis, in: Proceedings of IberLEF 2023, Jaén, Spain | es_ES |
dc.identifier.uri | http://hdl.handle.net/2183/33478 | |
dc.description.abstract | [Abstract] This paper introduces the LyS A Coruña proposal for the Guarani-Spanish Code Switching Analysis
task at IberLEF2023. The shared task proposes to analyze Guarani-Spanish code-switched texts, focusing
on language identification, named entity recognition (NER), and a novel classification task for Spanish
spans in a code-switched Guarani-Spanish context. We propose three multi-task learning systems that
have common encoders based on two language models and different decoders in a multi-task learning
setup. The encoders use the contextual embeddings by: (i) a large language model (LLM) pretrained
on bidirectional machine translation on 200 languages (including Spanish and Guarani) from the No
Language Left Behind project, and (ii) a BERT-based model pretrained in Spanish and finetuned in around
800k Guarani tokens. The decoders are: (i) a softmax output layer for Task 1, and (ii) conditional random
fields (CRF) output layers for Tasks 2 and 3. According to official results, we ranked third in the three
tasks. | es_ES |
dc.description.sponsorship | SCANNER-UDC; PID2020-113230RB-C21
Ministerio de Economía e Industria; PID2020-113230RB-C21
Xunta de Galicia; ED431C 2020/11
CITIC; ED431G 2019/01. | es_ES |
dc.language.iso | eng | es_ES |
dc.relation | info:eu-repo/grantAgreemet/EC/H2020/101100615 | es_ES |
dc.rights | Atribución 4.0 Internacional | es_ES |
dc.rights.uri | http://creativecommons.org/licenses/by/3.0/es/ | * |
dc.subject | Multi-Task Learning | es_ES |
dc.subject | Guaraní | es_ES |
dc.subject | Spanish | es_ES |
dc.subject | Code-switching | es_ES |
dc.subject | Language identification | es_ES |
dc.subject | Named Entity Recognition | es_ES |
dc.subject | Code Classification, | es_ES |
dc.title | LyS A Coruña at GUA-SPA@IberLEF2023. Multi-Task Learning with Large Language Model Encoders for Guarani-Spanish Code Switching Analysis | es_ES |
dc.type | info:eu-repo/semantics/conferenceObject | es_ES |
dc.type | info:eu-repo/semantics/conferenceObject | es_ES |
dc.rights.access | info:eu-repo/semantics/openAccess | es_ES |
UDC.conferenceTitle | IberLEF 2023 | es_ES |