Use this link to cite:
http://hdl.handle.net/2183/37283 A comparison of statistical association measures for identifying dependency-based collocations in various languages
Loading...
Identifiers
Publication date
Advisors
Other responsabilities
Journal Title
Bibliographic citation
Marcos Garcia, Marcos García Salido, and Margarita Alonso-Ramos. 2019. A comparison of statistical association measures for identifying dependency-based collocations in various languages.. In Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), pages 49–59, Florence, Italy. Association for Computational Linguistics.
Type of academic work
Academic degree
Abstract
[Abstract] This paper presents an exploration of different statistical association measures to automatically identify collocations from corpora
in English, Portuguese, and Spanish. To
evaluate the impact of the association measures we manually annotated corpora with
three different syntactic patterns of collocations (adjective-noun, verb-object and nominal compounds). We took advantage of the
PARSEME 1.1 Shared Task corpora by selecting a subset of 155k tokens in the three
referred languages, in which we annotated
1, 526 collocations with their Lexical Functions according to the Meaning-Text Theory.
Using the resulting gold-standard, we have
carried out a comparison between frequency
data and several well-known association measures, both symmetric and asymmetric. The
results show that the combination of dependency triples with raw frequency information is as powerful as the best association
measures in most syntactic patterns and languages. Furthermore, and despite the asymmetric behaviour of collocations, directional
approaches perform worse than the symmetric
ones in the extraction of these phraseological
combinations.
Description
Editor version
Rights
Atribución 4.0 Internacional








