Mostrar o rexistro simple do ítem
The Fragility of Multi-Treebank Parsing Evaluation
dc.contributor.author | Alonso-Alonso, Iago | |
dc.contributor.author | Vilares, David | |
dc.contributor.author | Gómez-Rodríguez, Carlos | |
dc.date.accessioned | 2024-05-23T08:10:00Z | |
dc.date.available | 2024-05-23T08:10:00Z | |
dc.date.issued | 2022-10 | |
dc.identifier.citation | Iago Alonso-Alonso, David Vilares, and Carlos Gómez-Rodríguez. 2022. The Fragility of Multi-Treebank Parsing Evaluation. In Proceedings of the 29th International Conference on Computational Linguistics, pages 5345–5359, Gyeongju, Republic of Korea. International Committee on Computational Linguistics. | es_ES |
dc.identifier.uri | http://hdl.handle.net/2183/36590 | |
dc.description | Held in Gyeongju, Republic of Korea. October 12-17, 2022 | es_ES |
dc.description.abstract | [Absctract]: Treebank selection for parsing evaluation and the spurious effects that might arise from a biased choice have not been explored in detail. This paper studies how evaluating on a single subset of treebanks can lead to weak conclusions. First, we take a few contrasting parsers, and run them on subsets of treebanks proposed in previous work, whose use was justified (or not) on criteria such as typology or data scarcity. Second, we run a large-scale version of this experiment, create vast amounts of random subsets of treebanks, and compare on them many parsers whose scores are available. The results show substantial variability across subsets and that although establishing guidelines for good treebank selection is hard, some inadequate strategies can be easily avoided. | es_ES |
dc.description.sponsorship | This work was supported by a 2020 Leonardo Grant for Researchers and Cultural Creators from the FBBVA,15 as well as by the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, grant agreement No 714150). The work is also supported by ERDF/MICINN-AEI (SCANNER-UDC, PID2020-113230RB-C21), by Xunta de Galicia (ED431C 2020/11), and by Centro de Investigación de Galicia “CITIC” which is funded by Xunta de Galicia, Spain and the European Union (ERDF - Galicia 2014–2020 Program), by grant ED431G 2019/01. | es_ES |
dc.description.sponsorship | Xunta de Galicia; ED431C 2020/11 | es_ES |
dc.description.sponsorship | Xunta de Galicia; ED431G 2019/01 | es_ES |
dc.language.iso | eng | es_ES |
dc.publisher | International Committee on Computational Linguistics | es_ES |
dc.relation | info:eu-repo/grantAgreement/EC/H2020/714150 | es_ES |
dc.relation | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113230RB-C21/ES/MODELOS MULTITAREA DE ETIQUETADO SECUENCIAL PARA EL RECONOCIMIENTO DE ENTIDADES ENRIQUECIDO CON INFORMACIÓN LINGÜÍSTICA: SINTAXIS E INTEGRACIÓN MULTITAREA (SCANNER-UDC) | es_ES |
dc.relation.uri | https://aclanthology.org/2022.coling-1.475.pdf | es_ES |
dc.rights | Atribución 3.0 España | es_ES |
dc.rights.uri | http://creativecommons.org/licenses/by/3.0/es/ | * |
dc.subject | Multi-Treebank Parsing Evaluation | es_ES |
dc.subject | Treebank Selection Bias | es_ES |
dc.subject | Evaluation Methodology | es_ES |
dc.subject | Parsing Performance Variability | es_ES |
dc.title | The Fragility of Multi-Treebank Parsing Evaluation | es_ES |
dc.type | info:eu-repo/semantics/conferenceObject | es_ES |
dc.rights.access | info:eu-repo/semantics/openAccess | es_ES |
UDC.journalTitle | Proceedings of the 29th International Conference on Computational Linguistics | es_ES |
UDC.startPage | 5345 | es_ES |
UDC.endPage | 5359 | es_ES |
UDC.conferenceTitle | 29th International Conference on Computational Linguistics (COLING’2022) | es_ES |
Ficheiros no ítem
Este ítem aparece na(s) seguinte(s) colección(s)
-
OpenAIRE [332]
-
GI-LYS - Congresos, conferencias, etc. [61]