Revisiting Supertagging for faster HPSG parsing
| UDC.coleccion | Investigación | es_ES |
| UDC.conferenceTitle | EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing | es_ES |
| UDC.departamento | Ciencias da Computación e Tecnoloxías da Información | es_ES |
| UDC.endPage | 11374 | es_ES |
| UDC.grupoInv | Lingua e Sociedade da Información (LYS) | es_ES |
| UDC.institutoCentro | CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación | es_ES |
| UDC.startPage | 11359 | es_ES |
| UDC.volume | Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing | es_ES |
| dc.contributor.author | Zamaraeva, Olga | |
| dc.contributor.author | Gómez-Rodríguez, Carlos | |
| dc.date.accessioned | 2025-03-05T15:41:12Z | |
| dc.date.available | 2025-03-05T15:41:12Z | |
| dc.date.issued | 2024-11 | |
| dc.description | Presented at: Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA,12-16 Nov. 2024. | es_ES |
| dc.description.abstract | [Abstract]: We present new supertaggers trained on English HPSG-based treebanks and test the effects of the best tagger on parsing speed and accuracy. HPSG treebanks are produced automatically by large manually built grammars and feature high-quality annotation based on a well-developed linguistic theory. The English Resource Grammar treebanks include diverse and challenging test datasets, beyond the usual WSJ section 23 and Wikipedia data. HPSG supertagging has previously relied on MaxEnt-based models. We use SVM and neural CRF- and BERT-based methods and show that both SVM and neural supertaggers achieve considerably higher accuracy compared to the baseline and lead to an increase not only in the parsing speed but also the parser accuracy with respect to gold dependency structures. Our fine-tuned BERT-based tagger achieves 97.26% accuracy on 950 sentences from WSJ23 and 93.88% on the out-of-domain technical essay The Cathedral and the Bazaar. We present experiments with integrating the best supertagger into an HPSG parser and observe a speedup of a factor of 3 with respect to the system which uses no tagging at all, as well as large recall gains and an overall precision gain. We also compare our system to an existing integrated tagger and show that although the well-integrated tagger remains the fastest, our experimental system can be more accurate. Finally, we hope that the diverse and difficult datasets we used for evaluation will gain more popularity in the field: we show that results can differ depending on the dataset, even if it is an in-domain one. We contribute the complete datasets reformatted for Huggingface token classification. | es_ES |
| dc.description.sponsorship | We acknowledge the European Union’s Horizon Europe Framework Programme which funded this research under the Marie Skłodowska-Curie postdoctoral fellowship grant HORIZON-MSCA-2021-PF-01 (GAUSS, grant agreement No 101063104); and the European Research Council (ERC), which has funded this research under the Horizon Europe research and innovation programme (SALSA, grant agreement No 101100615). We also acknowledge grants SCANNER-UDC (PID2020-113230RB-C21) funded by MICIU/AEI/10.13039/501100011033; GAP (PID2022-139308OA-I00) funded by MICIU/AEI/10.13039/501100011033/ and ERDF, EU; LATCHING (PID2023-147129OB-C21) funded by MICIU/AEI/10.13039/501100011033 and ERDF, EU; and TSI-100925-2023-1 funded by Ministry for Digital Transformation and Civil Service and “NextGenerationEU” PRTR; as well as funding by Xunta de Galicia (ED431C 2024/02), and Centro de Investigación de Galicia “CITIC”, funded by the Xunta de Galicia through the collaboration agreement between the Consellería de Cultura, Educación, Formación Profesional e Universidades and the Galician universities for the reinforcement of the research centres of the Galician University System (CIGUS). | es_ES |
| dc.description.sponsorship | Xunta de Galicia; ED431C 2024/02 | es_ES |
| dc.identifier.citation | Olga Zamaraeva and Carlos Gómez-Rodríguez. 2024. Revisiting Supertagging for faster HPSG parsing. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 11359–11374, Miami, Florida, USA. Association for Computational Linguistics. doi: 10.18653/v1/2024.emnlp-main.635 | es_ES |
| dc.identifier.doi | 10.18653/v1/2024.emnlp-main.635 | |
| dc.identifier.uri | http://hdl.handle.net/2183/41300 | |
| dc.language.iso | eng | es_ES |
| dc.publisher | Association for Computational Linguistics | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/EC/HE/101063104 | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/EC/HE/101100615 | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113230RB-C21/ES/MODELOS MULTITAREA DE ETIQUETADO SECUENCIAL PARA EL RECONOCIMIENTO DE ENTIDADES ENRIQUECIDO CON INFORMACIÓN LINGÜÍSTICA: SINTAXIS E INTEGRACIÓN MULTITAREA (SCANNER-UDC) | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-139308OA-100/ES/REPRESENTACIONES ESTRUCTURADAS VERDES Y ENCHUFABLES | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2023-147129OB-C21/ES/TECNOLOGÍAS DEL LENGUAJE DESDE UNA PERSPECTIVA VERDE (LATCHING): DOMINIOS CON ESCASOS RECURSOS | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/MTDPF//TSI-100925-2023-1/ES/CÁTEDRA UDC-INDITEX DE IA EN ALGORITMOS VERDES | es_ES |
| dc.relation.uri | https://doi.org/10.18653/v1/2024.emnlp-main.635 | es_ES |
| dc.rights | Atribución 4.0 Internacional | es_ES |
| dc.rights.accessRights | open access | es_ES |
| dc.rights.uri | http://creativecommons.org/licenses/by/3.0/es/ | * |
| dc.subject | Supertagging | es_ES |
| dc.subject | HPSG (Head-Driven Phrase Structure Grammar) | es_ES |
| dc.subject | Parsing | es_ES |
| dc.subject | SVM | es_ES |
| dc.subject | CRF | es_ES |
| dc.subject | BERT (Bidirectional Encoder Representations from Transformers) | es_ES |
| dc.subject | Token Classification | es_ES |
| dc.title | Revisiting Supertagging for faster HPSG parsing | es_ES |
| dc.type | conference output | es_ES |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | e70a3969-39f6-4458-9339-3b71756fa56e | |
| relation.isAuthorOfPublication.latestForDiscovery | e70a3969-39f6-4458-9339-3b71756fa56e |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Zamaraeva_Olga_2024_Revisiting_Supertagging_for_faster_HPSG_parsing.pdf
- Size:
- 272.76 KB
- Format:
- Adobe Portable Document Format
- Description:

