From Tokens to Trees: Mapping Syntactic Structures in the Deserts of Data-Scarce Languages

D. Vilares and A. Muñoz, "From Tokens to Trees: Mapping Syntactic Structures in the Deserts of Data-Scarce Languages", Seminar of the Spanish Society for Natural Language Processing: Projects and System Demonstrations (SEPLN-CEDI-PD 2024) co-located with the 7th Spanish Conference on Informatics (CEDI 2024), A Coruña, Spain, June 19-20, 2024. https://ceur-ws.org/Vol-3729/

Abstract

[Abstract]: Low-resource learning in natural language processing focuses on developing effective resources, tools, and technologies for languages that are less popular within the industry and academia. This effort is crucial for several reasons, including ensuring that as many languages as possible are represented digitally, and enhancing access to language technologies for native speakers of minority languages. In this context, this paper outlines the motivation, research lines, and results from a Leonardo Grant - by FBBVA - on low-resource languages and parsing as sequence labeling. The project’s primary aim was to devise fast and accurate methods for low-resource syntactic parsing and to examine evaluation strategies as well as strengths and weaknesses in comparison to alternative parsing strategies.

Keywords

cross-lingual learning
low-resource learning
multilinguality
natural language processing
parsing

Description

This project was supported by a 2020 Leonardo Grant for Researchers and Cultural Creators from the FBBVA

Editor version

https://ceur-ws.org/Vol-3729/

Rights

Atribución (CC BY)