Use this link to cite:

http://hdl.handle.net/2183/163

Integrating external dictionaries into Part-of-speech taggers

Loading...
Thumbnail Image

Identifiers

Publication date

Authors

Advisors

Other responsabilities

Journal Title

Bibliographic citation

Angelova, G.; Bontcheva, K.; Mitkov, R.; Nicolov, N.; Nikolov, N. (eds.), Proceedings of the Euroconference on Recent Advances in Natural Language Processing (RANLP-2001), Tzigov Chark (Bulgaria), pp. 122-128.

Type of academic work

Academic degree

Abstract

[Abstract] The highest performances in part-of-speech tagging have been obtained by using stochastic methods, such as hidden Markov models. The running parameters of a hidden Markov model for tagging can be estimated from tagged corpora. However, the current situation in the automatic processing of some languages is very short training texts, but very large dictionaries. These dictionaries can provide very useful information for improving the treatment of unknown words. In this paper we present new strategies for integrating external dictionaries into a stochastic tagging framework. Instead of the most intuitive Adding One method, we propose the use of the Good-Turing formulas, which produce less distortion of the model we are estimating. This technique guarantees good performances in the automatic processing of languages for which reference texts hardly exist.

Description

Keywords

Editor version

Rights