Parsing as Pretraining

Vilares, David; Strzyz, Michalina; Søgaard, Anders; Gómez-Rodríguez, Carlos

Parsing as Pretraining

UDC.coleccion	Investigación	es_ES
UDC.conferenceTitle	Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)	es_ES
UDC.departamento	Letras	es_ES
UDC.grupoInv	Lingua e Sociedade da Información (LYS)	es_ES
UDC.issue	5
UDC.volume	34
dc.contributor.author	Vilares, David
dc.contributor.author	Strzyz, Michalina
dc.contributor.author	Søgaard, Anders
dc.contributor.author	Gómez-Rodríguez, Carlos
dc.date.accessioned	2020-02-13T15:14:34Z
dc.date.available	2020-02-13T15:14:34Z
dc.date.issued	2020
dc.description.abstract	[Abstract] Recent analyses suggest that encoders pretrained for language modeling capture certain morpho-syntactic structure. However, probing frameworks for word vectors still do not report results on standard setups such as constituent and dependency parsing. This paper addresses this problem and does full parsing (on English) relying only on pretraining architectures – and no decoding. We first cast constituent and dependency parsing as sequence tagging. We then use a single feed-forward layer to directly map word vectors to labels that encode a linearized tree. This is used to: (i) see how far we can reach on syntax modelling with just pretrained encoders, and (ii) shed some light about the syntax-sensitivity of different word vectors (by freezing the weights of the pretraining network during training). For evaluation, we use bracketing F1-score and LAS, and analyze in-depth differences across representations for span lengths and dependency displacements. The overall results surpass existing sequence tagging parsers on the PTB (93.5%) and end-to-end EN-EWT UD (78.8%).	es_ES
dc.description.sponsorship	We thank Mark Anderson and Daniel Hershcovich for their comments. DV, MS and CGR are funded by the ERC under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, grant No 714150), by the ANSWER-ASAP project (TIN2017-85160-C2-1-R) from MINECO, and by Xunta de Galicia (ED431B 2017/01). AS is funded by a Google Focused Research Award	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431B 2017/01	es_ES
dc.identifier.citation	Vilares, D., Strzyz, M., Søgaard, A., & Gómez-Rodríguez, C. (2020). Parsing as Pretraining. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 9114-9121. https://doi.org/10.1609/aaai.v34i05.6446	es_ES
dc.identifier.doi	10.1609/aaai.v34i05.6446
dc.identifier.uri	http://hdl.handle.net/2183/24893
dc.language.iso	eng	es_ES
dc.publisher	AAAI Press
dc.relation.projectID	info:eu-repo/grantAgreement/EC/H2020/714150	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/TIN2017-85160-C2-1-R/ES/AVANCES EN NUEVOS SISTEMAS DE EXTRACCION DE RESPUESTAS CON ANALISIS SEMANTICO Y APRENDIZAJE PROFUNDO/
dc.relation.uri	https://doi.org/10.1609/aaai.v34i05.6446
dc.rights.accessRights	open access	es_ES
dc.subject	Natural language processing	es_ES
dc.subject	Parsing	es_ES
dc.subject	Sequence labeling	es_ES
dc.subject	Pretraining	es_ES
dc.title	Parsing as Pretraining	es_ES
dc.type	conference output	es_ES
dspace.entity.type	Publication
relation.isAuthorOfPublication	37dabbe9-f54f-43bb-960e-0bf3ac7e54eb
relation.isAuthorOfPublication	e70a3969-39f6-4458-9339-3b71756fa56e
relation.isAuthorOfPublication.latestForDiscovery	37dabbe9-f54f-43bb-960e-0bf3ac7e54eb

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Vilares_David_2020_Parsing_as_Pretraining.pdf
Size:: 817.06 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Investigación (FFIL)