A Linearization Framework for Dependency and Constituent Trees

Roca Rodríguez, Diego

dc.contributor.advisor	Vilares, David
dc.contributor.advisor	Gómez-Rodríguez, Carlos
dc.contributor.author	Roca Rodríguez, Diego
dc.contributor.other	Universidade da Coruña. Facultade de Informática	es_ES
dc.date.accessioned	2022-10-31T18:04:35Z
dc.date.available	2022-10-31T18:04:35Z
dc.date.issued	2022
dc.identifier.uri	http://hdl.handle.net/2183/31925
dc.description.abstract	[Abstract]: Parsing is a core natural language processing problem in which, given an input raw sentence, a model automatically produces a structured output that represents its syntactic structure. The most common formalisms in this field are constituent and dependency parsing. Although both formalisms show differences, they also share limitations, in particular the limited speed of the models to obtain the desired representation, and the lack of a common representation that allows any end-to-end neural system to obtain those models. Transforming both parsing tasks into a sequence labeling task solves both of these problems. Several tree linearizations have been proposed in the last few years, however there is no common suite that facilitates their use under an integrated framework. In this work, we will develop such a system. On the one hand, the system will be able to: (i) encode syntactic trees according to the desired syntactic formalism and linearization function, and (ii) decode linearized trees into their original representation. On the other hand, (iii) we will also train several neural sequence labeling systems to perform parsing from those labels, and we will compare the results.	es_ES
dc.description.abstract	[Resumen]: El análisis sintáctico es una tarea central dentro del procesado del lenguaje natural, en el que dada una oración se produce una salida que representa su estructura sintáctica. Los formalismos más populares son el de constituyentes y el de dependencias. Aunque son fundamentalmente diferentes, tienen ciertas limitaciones en común, como puede ser la lentitud de los modelos empleados para su predicción o la falta de una representación común que permita predecirlos con sistemas neuronales de uso general. Transformar ambos formalismos a una tarea de etiquetado de secuencias permite resolver ambos problemas. Durante los últimos años se han propuesto diferentes maneras de linearizar árboles sintácticos, pero todavía se carecía de un software unificado que permitiese obtener representaciones para ambos formalismos sobre un mismo sistema. En este trabajo se desarrollará dicho sistema. Por un lado, éste permitirá: (i) linearizar árboles sintácticos en el formalismo y función de linearización deseadas y (ii) decodificar árboles linearizados de vuelta a su formato original. Por otro lado, también se entrenarán varios modelos de etiquetado de secuencias, y se compararán los resultados obtenidos.	es_ES
dc.language.iso	eng	es_ES
dc.rights	Atribución 3.0 España	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by/3.0/es/	*
dc.subject	Natural language processing	es_ES
dc.subject	Tree Linearization	es_ES
dc.subject	Sequence Labeling	es_ES
dc.subject	Constituent Parsing	es_ES
dc.subject	Dependency Parsing	es_ES
dc.subject	Multi Task Learning	es_ES
dc.subject	Procesamiento Lenguaje Natural	es_ES
dc.subject	Linearización de árboles	es_ES
dc.subject	Etiquetado de secuencias	es_ES
dc.subject	Análisis Sintáctico de Constituyentes	es_ES
dc.subject	Análisis Sintáctico de Dependencias	es_ES
dc.subject	Aprendizaje Multitarea	es_ES
dc.title	A Linearization Framework for Dependency and Constituent Trees	es_ES
dc.type	info:eu-repo/semantics/bachelorThesis	es_ES
dc.rights.access	info:eu-repo/semantics/openAccess	es_ES
dc.description.traballos	Traballo fin de grao (UDC.FIC). Enxeñaría Informática. Curso 2021/2022	es_ES

Ficheiros no ítem

Nome:: license_rdf
Tamaño:: 1.337Kb
Formato:: application/rdf+xml

Ver/abrir

Nome:: RocaRodriguez_Diego_TFG_2022.pdf
Tamaño:: 1.151Mb
Formato:: PDF

Ver/abrir

Este ítem aparece na(s) seguinte(s) colección(s)

Enxeñaría informática, Grao en [447]

Mostrar o rexistro simple do ítem