The Fragility of Multi-Treebank Parsing Evaluation

Alonso-Alonso, Iago; Vilares, David; Gómez-Rodríguez, Carlos

dc.contributor.author	Alonso-Alonso, Iago
dc.contributor.author	Vilares, David
dc.contributor.author	Gómez-Rodríguez, Carlos
dc.date.accessioned	2024-05-23T08:10:00Z
dc.date.available	2024-05-23T08:10:00Z
dc.date.issued	2022-10
dc.identifier.citation	Iago Alonso-Alonso, David Vilares, and Carlos Gómez-Rodríguez. 2022. The Fragility of Multi-Treebank Parsing Evaluation. In Proceedings of the 29th International Conference on Computational Linguistics, pages 5345–5359, Gyeongju, Republic of Korea. International Committee on Computational Linguistics.	es_ES
dc.identifier.uri	http://hdl.handle.net/2183/36590
dc.description	Held in Gyeongju, Republic of Korea. October 12-17, 2022	es_ES
dc.description.abstract	[Absctract]: Treebank selection for parsing evaluation and the spurious effects that might arise from a biased choice have not been explored in detail. This paper studies how evaluating on a single subset of treebanks can lead to weak conclusions. First, we take a few contrasting parsers, and run them on subsets of treebanks proposed in previous work, whose use was justified (or not) on criteria such as typology or data scarcity. Second, we run a large-scale version of this experiment, create vast amounts of random subsets of treebanks, and compare on them many parsers whose scores are available. The results show substantial variability across subsets and that although establishing guidelines for good treebank selection is hard, some inadequate strategies can be easily avoided.	es_ES
dc.description.sponsorship	This work was supported by a 2020 Leonardo Grant for Researchers and Cultural Creators from the FBBVA,15 as well as by the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, grant agreement No 714150). The work is also supported by ERDF/MICINN-AEI (SCANNER-UDC, PID2020-113230RB-C21), by Xunta de Galicia (ED431C 2020/11), and by Centro de Investigación de Galicia “CITIC” which is funded by Xunta de Galicia, Spain and the European Union (ERDF - Galicia 2014–2020 Program), by grant ED431G 2019/01.	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431C 2020/11	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431G 2019/01	es_ES
dc.language.iso	eng	es_ES
dc.publisher	International Committee on Computational Linguistics	es_ES
dc.relation	info:eu-repo/grantAgreement/EC/H2020/714150	es_ES
dc.relation	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113230RB-C21/ES/MODELOS MULTITAREA DE ETIQUETADO SECUENCIAL PARA EL RECONOCIMIENTO DE ENTIDADES ENRIQUECIDO CON INFORMACIÓN LINGÜÍSTICA: SINTAXIS E INTEGRACIÓN MULTITAREA (SCANNER-UDC)	es_ES
dc.relation.uri	https://aclanthology.org/2022.coling-1.475.pdf	es_ES
dc.rights	Atribución 3.0 España	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/es/	*
dc.subject	Multi-Treebank Parsing Evaluation	es_ES
dc.subject	Treebank Selection Bias	es_ES
dc.subject	Evaluation Methodology	es_ES
dc.subject	Parsing Performance Variability	es_ES
dc.title	The Fragility of Multi-Treebank Parsing Evaluation	es_ES
dc.type	info:eu-repo/semantics/conferenceObject	es_ES
dc.rights.access	info:eu-repo/semantics/openAccess	es_ES
UDC.journalTitle	Proceedings of the 29th International Conference on Computational Linguistics	es_ES
UDC.startPage	5345	es_ES
UDC.endPage	5359	es_ES
UDC.conferenceTitle	29th International Conference on Computational Linguistics (COLING’2022)	es_ES

Ficheiros no ítem

Nome:: AlonsoAlonso_Iago_2022_The_fra ...
Tamaño:: 443.7Kb
Formato:: PDF

Ver/abrir

Nome:: license_rdf
Tamaño:: 1.337Kb
Formato:: application/rdf+xml

Ver/abrir

Este ítem aparece na(s) seguinte(s) colección(s)

OpenAIRE [332]
GI-LYS - Congresos, conferencias, etc. [61]

Mostrar o rexistro simple do ítem