Skip navigation
  •  Inicio
  • UDC 
    • Cómo depositar
    • Políticas do RUC
    • FAQ
    • Dereitos de Autor
    • Máis información en INFOguías UDC
  • Percorrer 
    • Comunidades
    • Buscar por:
    • Data de publicación
    • Autor
    • Título
    • Materia
  • Axuda
    • español
    • Gallegan
    • English
  • Acceder
  •  Galego 
    • Español
    • Galego
    • English
  
Ver ítem 
  •   RUC
  • Facultade de Filoloxía
  • Investigación (FFIL)
  • Ver ítem
  •   RUC
  • Facultade de Filoloxía
  • Investigación (FFIL)
  • Ver ítem
JavaScript is disabled for your browser. Some features of this site may not work without it.

LyS A Coruña at GUA-SPA@IberLEF2023. Multi-Task Learning with Large Language Model Encoders for Guarani-Spanish Code Switching Analysis

Thumbnail
Ver/abrir
Munoz_Ortiz_2023_Guarani_Spanish_Code_Switching_Analysis.pdf (711.9Kb)
Use este enlace para citar
http://hdl.handle.net/2183/33478
Atribución 4.0 Internacional
A non ser que se indique outra cousa, a licenza do ítem descríbese como Atribución 4.0 Internacional
Coleccións
  • Investigación (FFIL) [877]
Metadatos
Mostrar o rexistro completo do ítem
Título
LyS A Coruña at GUA-SPA@IberLEF2023. Multi-Task Learning with Large Language Model Encoders for Guarani-Spanish Code Switching Analysis
Autor(es)
Muñoz-Ortiz, Alberto
Vilares, David
Data
2023
Cita bibliográfica
A. Muñoz Ortiz, D. Vilares. LyS A Coruña at GUA-SPA@IberLEF2023: Multi-Task Learning with Large Language Model Encoders for Guarani-Spanish Code Switching Analysis, in: Proceedings of IberLEF 2023, Jaén, Spain
Resumo
[Abstract] This paper introduces the LyS A Coruña proposal for the Guarani-Spanish Code Switching Analysis task at IberLEF2023. The shared task proposes to analyze Guarani-Spanish code-switched texts, focusing on language identification, named entity recognition (NER), and a novel classification task for Spanish spans in a code-switched Guarani-Spanish context. We propose three multi-task learning systems that have common encoders based on two language models and different decoders in a multi-task learning setup. The encoders use the contextual embeddings by: (i) a large language model (LLM) pretrained on bidirectional machine translation on 200 languages (including Spanish and Guarani) from the No Language Left Behind project, and (ii) a BERT-based model pretrained in Spanish and finetuned in around 800k Guarani tokens. The decoders are: (i) a softmax output layer for Task 1, and (ii) conditional random fields (CRF) output layers for Tasks 2 and 3. According to official results, we ranked third in the three tasks.
Palabras chave
Multi-task learning
Guaraní
Spanish
Code switching
Language identification
Named entity recognition
Code classification
 
Dereitos
Atribución 4.0 Internacional

Listar

Todo RUCComunidades e colecciónsPor data de publicaciónAutoresTítulosMateriasGrupo de InvestigaciónTitulaciónEsta colecciónPor data de publicaciónAutoresTítulosMateriasGrupo de InvestigaciónTitulación

A miña conta

AccederRexistro

Estatísticas

Ver Estatísticas de uso
Sherpa
OpenArchives
OAIster
Scholar Google
UNIVERSIDADE DA CORUÑA. Servizo de Biblioteca.    DSpace Software Copyright © 2002-2013 Duraspace - Suxestións