Multidimensional Affective Analysis for Low-Resource Languages: A Use Case with Guarani-Spanish Code-Switching Language
Non accesible ata 2024-07-15
Use este enlace para citar
http://hdl.handle.net/2183/34367Coleccións
- GI-LYS - Artigos [43]
Metadatos
Mostrar o rexistro completo do ítemTítulo
Multidimensional Affective Analysis for Low-Resource Languages: A Use Case with Guarani-Spanish Code-Switching LanguageData
2023Cita bibliográfica
Agüero-Torales, M.M., López-Herrera, A.G. & Vilares, D. Multidimensional Affective Analysis for Low-Resource Languages: A Use Case with Guarani-Spanish Code-Switching Language. Cogn Comput 15, 1391–1406 (2023). https://doi.org/10.1007/s12559-023-10165-0
Resumo
[Abstract]: This paper focuses on text-based affective computing for Jopara, a code-switching language that combines Guarani and Spanish. First, we collected a dataset of tweets primarily written in Guarani and annotated them for three widely used dimensions in sentiment analysis: (a) emotion recognition, (b) humor detection, and (c) offensive language identification. Then, we developed several neural network models, including large language models specifically designed for Guarani, and compared their performance against off-the-shelf multilingual and Spanish pre-trained models for the aforementioned dimensions. Our experiments show that language models incorporating Guarani during pre-training or pre-fine-tuning consistently achieve the best results, despite limited resources (a single 24-GB GPU and only 800K tokens). Notably, even a Guarani BERT model with just two layers of Transformers shows a favorable balance between accuracy and computational power, likely due to the inherent low-resource nature of the task. We present a comprehensive overview of corpus creation and model development for low-resource languages like Guarani, particularly in the context of its code-switching with Spanish, resulting in Jopara. Our findings shed light on the challenges and strategies involved in analyzing affective language in such linguistic contexts.
Palabras chave
Natural language processing
Sentiment analysis
Affective analysis
Code-switching
Low-resource languages
Sentiment analysis
Affective analysis
Code-switching
Low-resource languages
Versión do editor
Dereitos
This version of the article has been accepted for publication, after peer review
but is not the Version of Record and does not reflect post-acceptance improvements,
or any corrections. The Version of Record is available online at: https:
//doi.org/10.1007/s12559-023-10165-0. Use of this Accepted Version is subject to
the publisher’s Accepted Manuscript terms of use https://www.springernature.com/
gp/open-research/policies/acceptedmanuscript-terms
ISSN
1866-9964