Not All Linearizations Are Equally Data-Hungry in Sequence Labeling Parsing

Muñoz-Ortiz, Alberto; Strzyz, Michalina; Vilares, David

Use this link to cite:

http://hdl.handle.net/2183/36664

Not All Linearizations Are Equally Data-Hungry in Sequence Labeling Parsing

Files

Muñoz_Ortiz_2021_Not_all_linearizations_equally_data_hungry.pdf (361.71 KB)

Identifiers

URI: http://hdl.handle.net/2183/36664

Publication date

2021-09

Authors

Muñoz-Ortiz, Alberto

Strzyz, Michalina

Vilares, David

Bibliographic citation

Alberto Muñoz-Ortiz, Michalina Strzyz, and David Vilares. 2021. Not All Linearizations Are Equally Data-Hungry in Sequence Labeling Parsing. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 978–988, Held Online. INCOMA Ltd..

Abstract

[Absctract]: Different linearizations have been proposed to cast dependency parsing as sequence labeling and solve the task as: (i) a head selection problem, (ii) finding a representation of the token arcs as bracket strings, or (iii) associating partial transition sequences of a transition-based parser to words. Yet, there is little understanding about how these linearizations behave in low-resource setups. Here, we first study their data efficiency, simulating data-restricted setups from a diverse set of rich-resource treebanks. Second, we test whether such differences manifest in truly low-resource setups. The results show that head selection encodings are more data-efficient and perform better in an ideal (gold) framework, but that such advantage greatly vanishes in favour of bracketing formats when the running setup resembles a real-world low-resource configuration.

Description

It was held online, 1-3 September 2021.

Keywords

Dependency Parsing Sequence Labeling Low-Resource NLP Data Efficiency

Editor version

https://aclanthology.org/2021.ranlp-1.111/

Rights

Atribución 3.0 España

Collections

Investigación (FFIL)

Full item page

Except where otherwise noted, this item's license is described as Atribución 3.0 España

Not All Linearizations Are Equally Data-Hungry in Sequence Labeling Parsing

Files

Identifiers

Publication date

Authors

Advisors

Other responsabilities

Journal Title

Bibliographic citation

Type of academic work

Academic degree

Abstract

Description

Keywords

Editor version

Rights

Collections