Semantic Relation Extraction. Resources, Tools and Strategies

García, Marcos

Use this link to cite:

http://hdl.handle.net/2183/19316

Semantic Relation Extraction. Resources, Tools and Strategies

Files

Garcia_Marcos_2016_Semantic_Relation_Extraction_Resources_Tools_and_Strategies.pdf (550.86 KB)

Identifiers

URI: http://hdl.handle.net/2183/19316

Publication date

2016-07

Authors

García, Marcos

Bibliographic citation

Marcos Garcia, Semantic Relation Extraction. Resources, Tools and Strategies, in João Silva, Ricardo Ribeiro, Paulo Quaresma, André Adami, António Branco (eds.), Computational Processing of the Portuguese Language. 12th International Conference, PROPOR 2016, Tomar, Portugal, July 13-15, 2016, Proceedings, volume 9727 of Lecture Notes in Artificial Intelligence, pp. 141-152, Springer, 2016.

Abstract

[Abstract] Relation extraction is a subtask of information extraction that aims at obtaining instances of semantic relations present in texts. This information can be arranged in machine-readable formats, useful for several applications that need structured semantic knowledge. The work presented in this paper explores different strategies to automate the extraction of semantic relations from texts in Portuguese, Galician and Spanish. Both machine learning (distant-supervised and supervised) and rule-based techniques are investigated, and the impact of the different levels of linguistic knowledge is analyzed for the various approaches. Regarding domains, the experiments are focused on the extraction of encyclopedic knowledge, by means of the development of biographical relations classifiers (in a closed domain) and the evaluation of an open information extraction tool. To implement the extraction systems, several natural language processing tools have been built for the three research languages: From sentence splitting and tokenization modules to part-of-speech taggers, named entity recognizers and coreference resolution systems. Furthermore, several lexica and corpora have been compiled and enriched with different levels of linguistic annotation, which are useful for both training and testing probabilistic and symbolic models. As a result of the performed work, new resources and tools are available for automated processing of texts in Portuguese, Galician and Spanish.