Parsing as Pretraining

UDC.coleccionInvestigaciónes_ES
UDC.conferenceTitleThirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)es_ES
UDC.departamentoLetrases_ES
UDC.grupoInvLingua e Sociedade da Información (LYS)es_ES
UDC.issue5
UDC.volume34
dc.contributor.authorVilares, David
dc.contributor.authorStrzyz, Michalina
dc.contributor.authorSøgaard, Anders
dc.contributor.authorGómez-Rodríguez, Carlos
dc.date.accessioned2020-02-13T15:14:34Z
dc.date.available2020-02-13T15:14:34Z
dc.date.issued2020
dc.description.abstract[Abstract] Recent analyses suggest that encoders pretrained for language modeling capture certain morpho-syntactic structure. However, probing frameworks for word vectors still do not report results on standard setups such as constituent and dependency parsing. This paper addresses this problem and does full parsing (on English) relying only on pretraining architectures – and no decoding. We first cast constituent and dependency parsing as sequence tagging. We then use a single feed-forward layer to directly map word vectors to labels that encode a linearized tree. This is used to: (i) see how far we can reach on syntax modelling with just pretrained encoders, and (ii) shed some light about the syntax-sensitivity of different word vectors (by freezing the weights of the pretraining network during training). For evaluation, we use bracketing F1-score and LAS, and analyze in-depth differences across representations for span lengths and dependency displacements. The overall results surpass existing sequence tagging parsers on the PTB (93.5%) and end-to-end EN-EWT UD (78.8%).es_ES
dc.description.sponsorshipWe thank Mark Anderson and Daniel Hershcovich for their comments. DV, MS and CGR are funded by the ERC under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, grant No 714150), by the ANSWER-ASAP project (TIN2017-85160-C2-1-R) from MINECO, and by Xunta de Galicia (ED431B 2017/01). AS is funded by a Google Focused Research Awardes_ES
dc.description.sponsorshipXunta de Galicia; ED431B 2017/01es_ES
dc.identifier.citationVilares, D., Strzyz, M., Søgaard, A., & Gómez-Rodríguez, C. (2020). Parsing as Pretraining. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 9114-9121. https://doi.org/10.1609/aaai.v34i05.6446es_ES
dc.identifier.doi10.1609/aaai.v34i05.6446
dc.identifier.urihttp://hdl.handle.net/2183/24893
dc.language.isoenges_ES
dc.publisherAAAI Press
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/H2020/714150es_ES
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/TIN2017-85160-C2-1-R/ES/AVANCES EN NUEVOS SISTEMAS DE EXTRACCION DE RESPUESTAS CON ANALISIS SEMANTICO Y APRENDIZAJE PROFUNDO/
dc.relation.urihttps://doi.org/10.1609/aaai.v34i05.6446
dc.rights.accessRightsopen accesses_ES
dc.subjectNatural language processinges_ES
dc.subjectParsinges_ES
dc.subjectSequence labelinges_ES
dc.subjectPretraininges_ES
dc.titleParsing as Pretraininges_ES
dc.typeconference outputes_ES
dspace.entity.typePublication
relation.isAuthorOfPublication37dabbe9-f54f-43bb-960e-0bf3ac7e54eb
relation.isAuthorOfPublicatione70a3969-39f6-4458-9339-3b71756fa56e
relation.isAuthorOfPublication.latestForDiscovery37dabbe9-f54f-43bb-960e-0bf3ac7e54eb

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Vilares_David_2020_Parsing_as_Pretraining.pdf
Size:
817.06 KB
Format:
Adobe Portable Document Format
Description: