Revisiting Supertagging for faster HPSG parsing

UDC.coleccionInvestigaciónes_ES
UDC.conferenceTitleEMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processinges_ES
UDC.departamentoCiencias da Computación e Tecnoloxías da Informaciónes_ES
UDC.endPage11374es_ES
UDC.grupoInvLingua e Sociedade da Información (LYS)es_ES
UDC.institutoCentroCITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicaciónes_ES
UDC.startPage11359es_ES
UDC.volumeProceedings of the 2024 Conference on Empirical Methods in Natural Language Processinges_ES
dc.contributor.authorZamaraeva, Olga
dc.contributor.authorGómez-Rodríguez, Carlos
dc.date.accessioned2025-03-05T15:41:12Z
dc.date.available2025-03-05T15:41:12Z
dc.date.issued2024-11
dc.descriptionPresented at: Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA,12-16 Nov. 2024.es_ES
dc.description.abstract[Abstract]: We present new supertaggers trained on English HPSG-based treebanks and test the effects of the best tagger on parsing speed and accuracy. HPSG treebanks are produced automatically by large manually built grammars and feature high-quality annotation based on a well-developed linguistic theory. The English Resource Grammar treebanks include diverse and challenging test datasets, beyond the usual WSJ section 23 and Wikipedia data. HPSG supertagging has previously relied on MaxEnt-based models. We use SVM and neural CRF- and BERT-based methods and show that both SVM and neural supertaggers achieve considerably higher accuracy compared to the baseline and lead to an increase not only in the parsing speed but also the parser accuracy with respect to gold dependency structures. Our fine-tuned BERT-based tagger achieves 97.26% accuracy on 950 sentences from WSJ23 and 93.88% on the out-of-domain technical essay The Cathedral and the Bazaar. We present experiments with integrating the best supertagger into an HPSG parser and observe a speedup of a factor of 3 with respect to the system which uses no tagging at all, as well as large recall gains and an overall precision gain. We also compare our system to an existing integrated tagger and show that although the well-integrated tagger remains the fastest, our experimental system can be more accurate. Finally, we hope that the diverse and difficult datasets we used for evaluation will gain more popularity in the field: we show that results can differ depending on the dataset, even if it is an in-domain one. We contribute the complete datasets reformatted for Huggingface token classification.es_ES
dc.description.sponsorshipWe acknowledge the European Union’s Horizon Europe Framework Programme which funded this research under the Marie Skłodowska-Curie postdoctoral fellowship grant HORIZON-MSCA-2021-PF-01 (GAUSS, grant agreement No 101063104); and the European Research Council (ERC), which has funded this research under the Horizon Europe research and innovation programme (SALSA, grant agreement No 101100615). We also acknowledge grants SCANNER-UDC (PID2020-113230RB-C21) funded by MICIU/AEI/10.13039/501100011033; GAP (PID2022-139308OA-I00) funded by MICIU/AEI/10.13039/501100011033/ and ERDF, EU; LATCHING (PID2023-147129OB-C21) funded by MICIU/AEI/10.13039/501100011033 and ERDF, EU; and TSI-100925-2023-1 funded by Ministry for Digital Transformation and Civil Service and “NextGenerationEU” PRTR; as well as funding by Xunta de Galicia (ED431C 2024/02), and Centro de Investigación de Galicia “CITIC”, funded by the Xunta de Galicia through the collaboration agreement between the Consellería de Cultura, Educación, Formación Profesional e Universidades and the Galician universities for the reinforcement of the research centres of the Galician University System (CIGUS).es_ES
dc.description.sponsorshipXunta de Galicia; ED431C 2024/02es_ES
dc.identifier.citationOlga Zamaraeva and Carlos Gómez-Rodríguez. 2024. Revisiting Supertagging for faster HPSG parsing. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 11359–11374, Miami, Florida, USA. Association for Computational Linguistics. doi: 10.18653/v1/2024.emnlp-main.635es_ES
dc.identifier.doi10.18653/v1/2024.emnlp-main.635
dc.identifier.urihttp://hdl.handle.net/2183/41300
dc.language.isoenges_ES
dc.publisherAssociation for Computational Linguisticses_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/HE/101063104es_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/HE/101100615es_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113230RB-C21/ES/MODELOS MULTITAREA DE ETIQUETADO SECUENCIAL PARA EL RECONOCIMIENTO DE ENTIDADES ENRIQUECIDO CON INFORMACIÓN LINGÜÍSTICA: SINTAXIS E INTEGRACIÓN MULTITAREA (SCANNER-UDC)es_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-139308OA-100/ES/REPRESENTACIONES ESTRUCTURADAS VERDES Y ENCHUFABLESes_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2023-147129OB-C21/ES/TECNOLOGÍAS DEL LENGUAJE DESDE UNA PERSPECTIVA VERDE (LATCHING): DOMINIOS CON ESCASOS RECURSOSes_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/MTDPF//TSI-100925-2023-1/ES/CÁTEDRA UDC-INDITEX DE IA EN ALGORITMOS VERDESes_ES
dc.relation.urihttps://doi.org/10.18653/v1/2024.emnlp-main.635es_ES
dc.rightsAtribución 4.0 Internacionales_ES
dc.rights.accessRightsopen accesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.subjectSupertagginges_ES
dc.subjectHPSG (Head-Driven Phrase Structure Grammar)es_ES
dc.subjectParsinges_ES
dc.subjectSVMes_ES
dc.subjectCRFes_ES
dc.subjectBERT (Bidirectional Encoder Representations from Transformers)es_ES
dc.subjectToken Classificationes_ES
dc.titleRevisiting Supertagging for faster HPSG parsinges_ES
dc.typeconference outputes_ES
dspace.entity.typePublication
relation.isAuthorOfPublicatione70a3969-39f6-4458-9339-3b71756fa56e
relation.isAuthorOfPublication.latestForDiscoverye70a3969-39f6-4458-9339-3b71756fa56e

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zamaraeva_Olga_2024_Revisiting_Supertagging_for_faster_HPSG_parsing.pdf
Size:
272.76 KB
Format:
Adobe Portable Document Format
Description: