Construction and evaluation of sentiment Datasets for low-resource languages: the case of Uzbek

Kuriyozov, Elmurod; Matlatipov, Sanatbek; Alonso, Miguel A.; Gómez-Rodríguez, Carlos

Construction and evaluation of sentiment Datasets for low-resource languages: the case of Uzbek

UDC.coleccion	Investigación	es_ES
UDC.conferenceTitle	LTC: Language and Technology Conference	es_ES
UDC.departamento	Letras	es_ES
UDC.endPage	243	es_ES
UDC.grupoInv	Lingua e Sociedade da Información (LYS)	es_ES
UDC.startPage	232	es_ES
UDC.volume	2019	es_ES
dc.contributor.author	Kuriyozov, Elmurod
dc.contributor.author	Matlatipov, Sanatbek
dc.contributor.author	Alonso, Miguel A.
dc.contributor.author	Gómez-Rodríguez, Carlos
dc.date.accessioned	2024-10-31T15:42:13Z
dc.date.available	2024-10-31T15:42:13Z
dc.date.issued	2022-06
dc.description	This is the Author Accepted Manuscript. This version of the conference paper has been accepted for publication, after peer review and is subject to Springer Nature’s AM terms of use (https://www.springernature.com/gp/open-science/policies/accepted-manuscript-terms), but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/978-3-031-05328-3_15.	es_ES
dc.description	Conference paper presented at: 9th Language and Technology Conference, LTC 2019, Poznan, Poland, May 17–19, 2019.	es_ES
dc.description.abstract	[Abstract]: To our knowledge, the majority of human language processing technologies for low-resource languages don’t have well-established linguistic resources for the development of sentiment analysis applications. Therefore, it is in dire need of such tools and resources to overcome the NLP barriers, so that, low-resource languages can deliver more benefits. In this paper, we fill that gap by providing its first annotated corpora for Uzbek language polarity classification. Our methodology considers collecting a medium-size manually annotated dataset and a larger-size dataset automatically translated from existing resources. Then, we use these datasets to train what, to our knowledge, are the first sentiment analysis models on the Uzbek language, using both traditional machine learning techniques and recent deep learning models. Both sets of techniques achieve similar accuracy (the best model on the manually annotated test set is a convolutional neural network with 88.89% accuracy, and on the translated set, a logistic regression with 89.56% accuracy); with the accuracy of the deep learning models being limited by the quality of available pre-trained word embeddings.	es_ES
dc.description.sponsorship	This work has received funding from ERDF/MICINN-AEI (ANSWER-ASAP, TIN2017-85160-C2-1-R; SCANNER-UDC, PID2020-113230RB-C21), from Xunta de Galicia (ED431C 2020/11), and from Centro de Investigación de Galicia “CITIC”, funded by Xunta de Galicia and the European Union (ERDF - Galicia 2014–2020 Program), by grant ED431G 2019/01. Elmurod Kuriyozov was funded for his PhD by El-Yurt-Umidi Foundation under the Cabinet of Ministers of the Republic of Uzbekistan.	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431C 2020/11	es_ES
dc.description.sponsorship	Xunta de Galicia; ED431G 2019/01	es_ES
dc.identifier.citation	Kuriyozov, E., Matlatipov, S., Alonso, M.A., Gómez-Rodríguez, C. (2022). Construction and Evaluation of Sentiment Datasets for Low-Resource Languages: The Case of Uzbek. In: Vetulani, Z., Paroubek, P., Kubis, M. (eds) Human Language Technology. Challenges for Computer Science and Linguistics. LTC 2019. Lecture Notes in Computer Science(), vol 13212. Springer, Cham. https://doi.org/10.1007/978-3-031-05328-3_15	es_ES
dc.identifier.doi	10.1007/978-3-031-05328-3_15
dc.identifier.isbn	978-3-031-05327-6
dc.identifier.issn	0302-9743
dc.identifier.uri	http://hdl.handle.net/2183/39913
dc.language.iso	eng	es_ES
dc.publisher	Springer	es_ES
dc.relation.ispartofseries	Lecture Notes in Computer Science (LNCS), including its subseries Lecture Notes in Artificial Intelligence (LNAI) and Lecture Notes in Bioinformatics (LNBI)	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/TIN2017-85160-C2-1-R/ES/AVANCES EN NUEVOS SISTEMAS DE EXTRACCION DE RESPUESTAS CON ANALISIS SEMANTICO Y APRENDIZAJE PROFUNDO/	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113230RB-C21/ES/MODELOS MULTITAREA DE ETIQUETADO SECUENCIAL PARA EL RECONOCIMIENTO DE ENTIDADES ENRIQUECIDO CON INFORMACIÓN LINGÜÍSTICA: SINTAXIS E INTEGRACIÓN MULTITAREA (SCANNER-UDC)	es_ES
dc.relation.uri	https://doi.org/10.1007/978-3-031-05328-3_15	es_ES
dc.rights	© 2022 Springer Nature Switzerland AG. Subject to Springer Nature’s AM terms of use (https://www.springernature.com/gp/open-science/policies/accepted-manuscript-terms).	es_ES
dc.rights.accessRights	open access	es_ES
dc.subject	Sentiment analysis	es_ES
dc.subject	Low-resource languages	es_ES
dc.subject	Uzbek language	es_ES
dc.title	Construction and evaluation of sentiment Datasets for low-resource languages: the case of Uzbek	es_ES
dc.type	conference output	es_ES
dspace.entity.type	Publication
relation.isAuthorOfPublication	1318edb8-3967-465c-a267-146624c05837
relation.isAuthorOfPublication	e70a3969-39f6-4458-9339-3b71756fa56e
relation.isAuthorOfPublication.latestForDiscovery	1318edb8-3967-465c-a267-146624c05837

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Kuriyozov_Elmurod_2022_Construction_and_evaluation_of_sentiment_Datasets_for_Low_Resource-Languages.pdf
Size:: 543.03 KB
Format:: Adobe Portable Document Format
Description:

Download

Collections

Investigación (FFIL)