Not All Linearizations Are Equally Data-Hungry in Sequence Labeling Parsing

Loading...
Thumbnail Image

Identifiers

Publication date

Authors

Advisors

Other responsabilities

Journal Title

Bibliographic citation

Alberto Muñoz-Ortiz, Michalina Strzyz, and David Vilares. 2021. Not All Linearizations Are Equally Data-Hungry in Sequence Labeling Parsing. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 978–988, Held Online. INCOMA Ltd..

Type of academic work

Academic degree

Abstract

[Absctract]: Different linearizations have been proposed to cast dependency parsing as sequence labeling and solve the task as: (i) a head selection problem, (ii) finding a representation of the token arcs as bracket strings, or (iii) associating partial transition sequences of a transition-based parser to words. Yet, there is little understanding about how these linearizations behave in low-resource setups. Here, we first study their data efficiency, simulating data-restricted setups from a diverse set of rich-resource treebanks. Second, we test whether such differences manifest in truly low-resource setups. The results show that head selection encodings are more data-efficient and perform better in an ideal (gold) framework, but that such advantage greatly vanishes in favour of bracketing formats when the running setup resembles a real-world low-resource configuration.

Description

It was held online, 1-3 September 2021.

Rights

Atribución 3.0 España
Atribución 3.0 España

Except where otherwise noted, this item's license is described as Atribución 3.0 España