Browsing by Author "Gómez-Rodríguez, Carlos"
Now showing items 1-20 of 77
-
4 and 7-bit Labeling for Projective and Non-Projective Dependency Trees
Gómez-Rodríguez, Carlos; Roca Rodríguez, Diego; Vilares, David (Association for Computational Linguistics, 2023-12)[Absctract]: We introduce an encoding for parsing as sequence labeling that can represent any projective dependency tree as a sequence of 4-bit labels, one per word. The bits in each word’s label represent (1) whether it ... -
A linguistic approach for determining the topics of Spanish Twitter messages
Vilares, David; Alonso, Miguel A.; Gómez-Rodríguez, Carlos (SAGE Publications & CILIP, 2015)[Abstract]: The vast number of opinions and reviews provided in Twitter is helpful in order to make interesting findings about a given industry, but given the huge number of messages published every day, it is important ... -
A non-projective greedy dependency parser with bidirectional LSTMs
Vilares, David; Gómez-Rodríguez, Carlos (Association for Computational Linguistics, 2017-08)[Abstract]: The LyS-FASTPARSE team present BIST-COVINGTON, a neural implementation of the Covington (2001) algorithm for non-projective dependency parsing. The bidirectional LSTM approach by Kiperwasser and Goldberg (2016) ... -
A syntactic approach for opinion mining on Spanish reviews
Vilares, David; Alonso, Miguel A.; Gómez-Rodríguez, Carlos (Cambridge University Press, 2015-01)[Abstract]: We describe an opinion mining system which classifies the polarity of Spanish texts. We propose an NLP approach that undertakes pre-processing, tokenisation and POS tagging of texts to then obtain the syntactic ... -
A Transition-Based Algorithm for Unrestricted AMR Parsing
Vilares, David; Gómez-Rodríguez, Carlos (Association for Computational Linguistics, 2018-06)[Absctract]: Non-projective parsing can be useful to handle cycles and reentrancy in AMR graphs. We explore this idea and introduce a greedy left-to-right non-projective transition-based parser. At each parsing configuration, ... -
A Unifying Theory of Transition-based and Sequence Labeling Parsing
Gómez-Rodríguez, Carlos; Strzyz, Michalina; Vilares, David (International Committee on Computational Linguistics, 2020-12)[Absctract]: We define a mapping from transition-based parsing algorithms that read sentences from left to right to sequence labeling encodings of syntactic trees. This not only establishes a theoretical relation between ... -
Una aproximación supervisada para la minería de opiniones sobre tuits en español en base a conocimiento lingüístico
Vilares, David; Alonso, Miguel A.; Gómez-Rodríguez, Carlos (Sociedad Española para el Procesamiento del Lenguaje Natural, 2013)[Resumen]: En este artículo se describe un sistema para la clasificación de la polaridad de tuits escritos en español. Se adopta una aproximación híbrida, que combina conocimiento lingüístico obtenido mediante PLN con ... -
Artificially Evolved Chunks for Morphosyntactic Analysis
Anderson, Mark; Vilares, David; Gómez-Rodríguez, Carlos (Association for Computational Linguistics, 2019-08)[Absctract]: We introduce a language-agnostic evolutionary technique for automatically extracting chunks from dependency treebanks. We evaluate these chunks on a number of morphosyntactic tasks, namely POS tagging, ... -
Assessment of Pre-Trained Models Across Languages and Grammars
Muñoz-Ortiz, Alberto; Vilares, David; Gómez-Rodríguez, Carlos (Association for Computational Linguistics, 2023-11)[Absctract]: We present an approach for assessing how multilingual large language models (LLMs) learn syntax in terms of multi-formalism syntactic structures. We aim to recover constituent and dependency structures by ... -
BERTbek: A Pretrained Language Model for Uzbek
Kuriyozov, Elmurod; Vilares, David; Gómez-Rodríguez, Carlos (European Language Resources Association (ELRA), 2024-05)[Abstract]: Recent advances in neural networks based language representation made it possible for pretrained language models to outperform previous models in many downstream natural language processing (NLP) tasks. These ... -
Bertinho: Galician BERT Representations
Vilares, David; García, Marcos; Gómez-Rodríguez, Carlos (Sociedad Española para el Procesamiento del Lenguaje Natural, 2021-03)[Abstract]: This paper presents a monolingual BERT model for Galician. We follow the recent trend that shows that it is feasible to build robust monolingual BERT models even for relatively low-resource languages, while ... -
Bracketing Encodings for 2-Planar Dependency Parsing
Strzyz, Michalina; Vilares, David; Gómez-Rodríguez, Carlos (International Committee on Computational Linguistics, 2020-12)[Absctract]: We present a bracketing-based encoding that can be used to represent any 2-planar dependency tree over a sentence of length n as a sequence of n labels, hence providing almost total coverage of crossing arcs ... -
Clasificación de polaridad en textos con opiniones en español mediante análisis sintáctico de dependencias
Vilares, David; Alonso, Miguel A.; Gómez-Rodríguez, Carlos (Sociedad Española para el Procesamiento del Lenguaje Natural, 2013)[Resumen]: En este artículo se describe un sistema de minería de opiniones que clasifica la polaridad de textos en español. Se propone una aproximación basada en PLN que conlleva realizar una segmentación, tokenización y ... -
Cognitive Constraints Built into Formal Grammars: Implications for Language Evolution
Gómez-Rodríguez, Carlos; Christiansen, Morten H.; Ferrer-i-Cancho, Ramon (Ravignani, A., Barbieri, C., Martins, M., Flaherty, M., Jadoul, Y., Lattenkamp, E., Little, H., Mudd, K., Verhoef, T., 2020-04-17)[Abstract] We study the validity of the cognitive independence assumption using an ensemble of artificial syntactic structures from various classes of dependency grammars. Our findings show that memory limitations have ... -
Comparing neural- and N-gram-based language models for word segmentation
Doval, Yerai; Gómez-Rodríguez, Carlos (John Wiley and Sons Inc., 2019-02)[Abstract]: Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based ... -
Constituent Parsing as Sequence Labeling
Gómez-Rodríguez, Carlos; Vilares, David (Association for Computational Linguistics (ACL), 2018)[Absctract]: We introduce a method to reduce constituent parsing to sequence labeling. For each word wt, it generates a label that encodes: (1) the number of ancestors in the tree that the words wt and wt+1 have in common, ... -
Construction and evaluation of sentiment Datasets for low-resource languages: the case of Uzbek
Kuriyozov, Elmurod; Matlatipov, Sanatbek; Alonso, Miguel A.; Gómez-Rodríguez, Carlos (Springer, 2022-06)[Abstract]: To our knowledge, the majority of human language processing technologies for low-resource languages don’t have well-established linguistic resources for the development of sentiment analysis applications. ... -
Contrasting Linguistic Patterns in Human and LLM-Generated News Text
Muñoz-Ortiz, Alberto; Gómez-Rodríguez, Carlos; Vilares, David (Springer, 2024)[Abstract]: We conduct a quantitative analysis contrasting human-written English news text with comparable large language model (LLM) output from six different LLMs that cover three different families and four sizes in ... -
Creación de un treebank de dependencias universales mediante recursos existentes para lenguas próximas: el caso del gallego
García, Marcos; Gómez-Rodríguez, Carlos; Alonso, Miguel A. (Sociedad Española para el Procesamiento del Lenguaje Natural, 2016-09)[Resumen] En este trabajo presentamos una nueva estrategia para crear treebanks de lenguas con pocos recursos para el análisis sintáctico. El método consiste en la adaptación y combinación de diferentes treebanks anotados ... -
Cross-lingual Inflection as a Data Augmentation Method for Parsing
Muñoz-Ortiz, Alberto; Gómez-Rodríguez, Carlos; Vilares, David (Association for Computational Linguistics, 2022-05)[Absctract]: We propose a morphology-based method for low-resource (LR) dependency parsing. We train a morphological inflector for target LR languages, and apply it to related rich-resource (RR) treebanks to create ...