Skip navigation
  •  Home
  • UDC 
    • Getting started
    • RUC Policies
    • FAQ
    • FAQ on Copyright
    • More information at INFOguias UDC
  • Browse 
    • Communities
    • Browse by:
    • Issue Date
    • Author
    • Title
    • Subject
  • Help
    • español
    • Gallegan
    • English
  • Login
  •  English 
    • Español
    • Galego
    • English
  
View Item 
  •   DSpace Home
  • Facultade de Filoloxía
  • Investigación (FFIL)
  • View Item
  •   DSpace Home
  • Facultade de Filoloxía
  • Investigación (FFIL)
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

LyS A Coruña at GUA-SPA@IberLEF2023. Multi-Task Learning with Large Language Model Encoders for Guarani-Spanish Code Switching Analysis

Thumbnail
View/Open
Munoz_Ortiz_2023_Guarani_Spanish_Code_Switching_Analysis.pdf (711.9Kb)
Use this link to cite
http://hdl.handle.net/2183/33478
Atribución 4.0 Internacional
Except where otherwise noted, this item's license is described as Atribución 4.0 Internacional
Collections
  • Investigación (FFIL) [877]
Metadata
Show full item record
Title
LyS A Coruña at GUA-SPA@IberLEF2023. Multi-Task Learning with Large Language Model Encoders for Guarani-Spanish Code Switching Analysis
Author(s)
Muñoz-Ortiz, Alberto
Vilares, David
Date
2023
Citation
A. Muñoz Ortiz, D. Vilares. LyS A Coruña at GUA-SPA@IberLEF2023: Multi-Task Learning with Large Language Model Encoders for Guarani-Spanish Code Switching Analysis, in: Proceedings of IberLEF 2023, Jaén, Spain
Abstract
[Abstract] This paper introduces the LyS A Coruña proposal for the Guarani-Spanish Code Switching Analysis task at IberLEF2023. The shared task proposes to analyze Guarani-Spanish code-switched texts, focusing on language identification, named entity recognition (NER), and a novel classification task for Spanish spans in a code-switched Guarani-Spanish context. We propose three multi-task learning systems that have common encoders based on two language models and different decoders in a multi-task learning setup. The encoders use the contextual embeddings by: (i) a large language model (LLM) pretrained on bidirectional machine translation on 200 languages (including Spanish and Guarani) from the No Language Left Behind project, and (ii) a BERT-based model pretrained in Spanish and finetuned in around 800k Guarani tokens. The decoders are: (i) a softmax output layer for Task 1, and (ii) conditional random fields (CRF) output layers for Tasks 2 and 3. According to official results, we ranked third in the three tasks.
Keywords
Multi-task learning
Guaraní
Spanish
Code switching
Language identification
Named entity recognition
Code classification
 
Rights
Atribución 4.0 Internacional

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsResearch GroupAcademic DegreeThis CollectionBy Issue DateAuthorsTitlesSubjectsResearch GroupAcademic Degree

My Account

LoginRegister

Statistics

View Usage Statistics
Sherpa
OpenArchives
OAIster
Scholar Google
UNIVERSIDADE DA CORUÑA. Servizo de Biblioteca.    DSpace Software Copyright © 2002-2013 Duraspace - Send Feedback