Building a New Sentiment Analysis Dataset for Uzbek Language and Creating Baseline Models

Kuriyozov, Elmurod; Matlatipov, Sanatbek

Use this link to cite:

http://hdl.handle.net/2183/23667

Building a New Sentiment Analysis Dataset for Uzbek Language and Creating Baseline Models

Files

kuriyozov_elmurod_matlatipov_sanatbek_2019_buiding_new_sentiment_analysis_dataset_for_uzbek_language_and_creating_baseline_models.pdf (161.82 KB)

Identifiers

URI: http://hdl.handle.net/2183/23667

Publication date

2019-08-02

Authors

Kuriyozov, Elmurod

Matlatipov, Sanatbek

Bibliographic citation

Kuriyozov, E.; Matlatipov, S. Building a New Sentiment Analysis Dataset for Uzbek Language and Creating Baseline Models. Proceedings 2019, 21, 37.

Abstract

[Abstract] Making natural language processing technologies available for low-resource languages is an important goal to improve the access to technology in their communities of speakers. In this paper, we provide the first annotated corpora for polarity classification for Uzbek language. Our methodology considers collecting a medium-size manually annotated dataset and a larger-size dataset automatically translated from existing resources. Then, we use these datasets to train sentiment analysis models on the Uzbek language, using both traditional machine learning techniques and recent deep learning models.