Listar Lingua e Sociedade da Información (Language in the Information Society) (LYS) por título

Artificially Evolved Chunks for Morphosyntactic Analysis

Anderson, Mark; Vilares, David; Gómez-Rodríguez, Carlos (Association for Computational Linguistics, 2019-08)

[Absctract]: We introduce a language-agnostic evolutionary technique for automatically extracting chunks from dependency treebanks. We evaluate these chunks on a number of morphosyntactic tasks, namely POS tagging, ...

Assessment of Pre-Trained Models Across Languages and Grammars

Muñoz-Ortiz, Alberto; Vilares, David; Gómez-Rodríguez, Carlos (Association for Computational Linguistics, 2023-11)

[Absctract]: We present an approach for assessing how multilingual large language models (LLMs) learn syntax in terms of multi-formalism syntactic structures. We aim to recover constituent and dependency structures by ...

Better, Faster, Stronger Sequence Tagging Constituent Parsers

Vilares, David; Abdou, Mostafa; Søgaard, Anders (Association for Computational Linguistics, 2019-06)

[Absctract]: Sequence tagging models for constituent parsing are faster, but less accurate than other types of parsers. In this work, we address the following weaknesses of such constituent parsers: (a) high error rates ...

Bracketing Encodings for 2-Planar Dependency Parsing

Strzyz, Michalina; Vilares, David; Gómez-Rodríguez, Carlos (International Committee on Computational Linguistics, 2020-12)

[Absctract]: We present a bracketing-based encoding that can be used to represent any 2-planar dependency tree over a sentence of length n as a sequence of n labels, hence providing almost total coverage of crossing arcs ...

Building a New Sentiment Analysis Dataset for Uzbek Language and Creating Baseline Models

Kuriyozov, Elmurod; Matlatipov, Sanatbek (2019-08-02)

[Abstract] Making natural language processing technologies available for low-resource languages is an important goal to improve the access to technology in their communities of speakers. In this paper, we provide the first ...

Clasificación de polaridad en textos con opiniones en español mediante análisis sintáctico de dependencias

Vilares, David; Alonso, Miguel A.; Gómez-Rodríguez, Carlos (Sociedad Española para el Procesamiento del Lenguaje Natural, 2013)

[Resumen]: En este artículo se describe un sistema de minería de opiniones que clasifica la polaridad de textos en español. Se propone una aproximación basada en PLN que conlleva realizar una segmentación, tokenización y ...

Cognitive Constraints Built into Formal Grammars: Implications for Language Evolution

Gómez-Rodríguez, Carlos; Christiansen, Morten H.; Ferrer-i-Cancho, Ramon (Ravignani, A., Barbieri, C., Martins, M., Flaherty, M., Jadoul, Y., Lattenkamp, E., Little, H., Mudd, K., Verhoef, T., 2020-04-17)

[Abstract] We study the validity of the cognitive independence assumption using an ensemble of artificial syntactic structures from various classes of dependency grammars. Our findings show that memory limitations have ...

Comparing neural- and N-gram-based language models for word segmentation

Doval, Yerai; Gómez-Rodríguez, Carlos (John Wiley and Sons Inc., 2019-02)

[Abstract]: Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based ...

Constituent Parsing as Sequence Labeling

Gómez-Rodríguez, Carlos; Vilares, David (Association for Computational Linguistics (ACL), 2018)

[Absctract]: We introduce a method to reduce constituent parsing to sequence labeling. For each word wt, it generates a label that encodes: (1) the number of ancestors in the tree that the words wt and wt+1 have in common, ...

Construcción de una lista de colocaciones para medir la competencia colocacional

Orol-González, Ana (Centro Virtual Cervantes, 2015)

[Abstrac] The aim of this work is to create a list of Spanish collocations with assessment purpose. For the creation of this list we have followed a set of previously established criteria which are based on lists of frequent ...

Creación de un treebank de dependencias universales mediante recursos existentes para lenguas próximas: el caso del gallego

García, Marcos; Gómez-Rodríguez, Carlos; Alonso, Miguel A. (Sociedad Española para el Procesamiento del Lenguaje Natural, 2016-09)

[Resumen] En este trabajo presentamos una nueva estrategia para crear treebanks de lenguas con pocos recursos para el análisis sintáctico. El método consiste en la adaptación y combinación de diferentes treebanks anotados ...

Cross-lingual Inflection as a Data Augmentation Method for Parsing

Muñoz-Ortiz, Alberto; Gómez-Rodríguez, Carlos; Vilares, David (Association for Computational Linguistics, 2022-05)

[Absctract]: We propose a morphology-based method for low-resource (LR) dependency parsing. We train a morphological inflector for target LR languages, and apply it to related rich-resource (RR) treebanks to create ...

Dependency parsing with bottom-up Hierarchical Pointer Networks

Fernández-González, Daniel; Gómez-Rodríguez, Carlos (Elsevier, 2023-03)

[Abstract] Dependency parsing is a crucial step towards deep language understanding and, therefore, widely demanded by numerous Natural Language Processing applications. In particular, left-to-right and top-down transition-based ...

Detecting Perspectives in Political Debates

Vilares, David; He, Yulan (Association for Computational Linguistics, 2017-09)

[Abstract]: We explore how to detect people’s perspectives that occupy a certain proposition. We propose a Bayesian modelling approach where topics (or propositions) and their associated perspectives (or viewpoints) are ...

Developing Open-Source Roguelike Games for Visually-Impaired Players by Using Low-Complexity NLP Techniques

Fernández-Núñez, Luis; Penas, Darío; Viteri Letamendía, Jorge; Vilares, Jesús (MDPI, 2020-08-19)

[Abstract] The prominent graphic component of video games greatly limits the accessibility of thistype of entertainment by visually impaired users. We make here an overview of the first gamesdeveloped within an initiative ...

Discontinuous Constituent Parsing as Sequence Labeling

Vilares, David; Gómez-Rodríguez, Carlos (Association for Computational Linguistics, 2020-11)

[Absctract]: This paper reduces discontinuous parsing to sequence labeling. It first shows that existing reductions for constituent parsing as labeling do not support discontinuities. Second, it fills this gap and proposes ...

Discontinuous grammar as a foreign language

Fernández-González, Daniel; Gómez-Rodríguez, Carlos (Elsevier, 2023-03)

[Abstract] In order to achieve deep natural language understanding, syntactic constituent parsing is a vital step, highly demanded by many artificial intelligence systems to process both text and speech. One of the most ...

Discovering Topics in Twitter About the COVID-19 Outbreak in Spain

Agüero-Torales, Marvin M.; Vilares, David; López-Herrera, Antonio G. (Sociedad Española de Procesamiento del Lenguaje Natural, 2021)

[Resumen] En este trabajo, analizamos lo que los usuarios han estado discutiendo en Twitter durante el comienzo de la pandemia causada por el COVID-19. Concretamente, analizamos tres fases diferenciadas de la crisis del ...

EN-ES-CS: An English-Spanish Code-Switching Twitter Corpus for Multilingual Sentiment Analysis

Vilares, David; Alonso, Miguel A.; Gómez-Rodríguez, Carlos (European Language Resources Association (ELRA), 2016-05)

[Abstract]: Code-switching texts are those that contain terms in two or more different languages, and they appear increasingly often in social media. The aim of this paper is to provide a resource to the research community ...

Entity linking with distributional semantics

Gamallo, Pablo; García, Marcos (Springer, 2016-07)

[Abstract] Entity Linking (EL) consists in linking name mentions in a given text with their referring entities in external knowledge bases such as DBpedia/Wikipedia. In this paper, we propose an EL approach whose main ...