Mostrar o rexistro simple do ítem

dc.contributor.authorKuriyozov, Elmurod
dc.contributor.authorVilares, David
dc.contributor.authorGómez-Rodríguez, Carlos
dc.date.accessioned2024-09-05T10:13:24Z
dc.date.available2024-09-05T10:13:24Z
dc.date.issued2024-05
dc.identifier.citationElmurod Kuriyozov, David Vilares, and Carlos Gómez-Rodríguez. 2024. BERTbek: A Pretrained Language Model for Uzbek. In Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024, pages 33–44, Torino, Italia. ELRA and ICCL. https://aclanthology.org/2024.sigul-1.5es_ES
dc.identifier.urihttp://hdl.handle.net/2183/38881
dc.descriptionAll the code used in this work are openly available at https://github.com/elmurod1202/BERTbek. Also, the BERTbek models have been uploaded to the HuggingFace Models Hub at https://huggingface.co/elmurod1202/bertbek-news-big-cased.es_ES
dc.description.abstract[Abstract]: Recent advances in neural networks based language representation made it possible for pretrained language models to outperform previous models in many downstream natural language processing (NLP) tasks. These pretrained language models have also shown that if large enough, they exhibit good few-shot abilities, which is especially beneficial for low-resource scenarios. In this respect, although there are some large-scale multilingual pretrained language models available, language-specific pretrained models have demonstrated to be more accurate for monolingual evaluation setups. In this work, we present BERTbek - pretrained language models based on the BERT (Bidirectional Encoder Representations from Transformers) architecture for the low-resource Uzbek language. We also provide a comprehensive evaluation of the models on a number of NLP tasks: sentiment analysis, multi-label topic classification, and named entity recognition, comparing the models with various machine learning methods as well as multilingual BERT (mBERT). Experimental results indicate that our models outperform mBERT and other task-specific baseline models in all three tasks. Additionally, we also show the impact of training data size and quality on the downstream performance of BERT models, by training three different models with different text sources and corpus sizes.es_ES
dc.language.isoenges_ES
dc.publisherEuropean Language Resources Association (ELRA)es_ES
dc.relation.urihttps://aclanthology.org/2024.sigul-1.5es_ES
dc.rightsAtribución-NoComercial 3.0 Españaes_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc/3.0/es/*
dc.subjectBERTes_ES
dc.subjectlanguage modelinges_ES
dc.subjectlow-resource languageses_ES
dc.subjectnatural language processinges_ES
dc.subjectUzbek languagees_ES
dc.titleBERTbek: A Pretrained Language Model for Uzbekes_ES
dc.typeinfo:eu-repo/semantics/conferenceObjectes_ES
dc.rights.accessinfo:eu-repo/semantics/openAccesses_ES
UDC.startPage33es_ES
UDC.endPage44es_ES
UDC.conferenceTitleSIGUL 2024es_ES


Ficheiros no ítem

Thumbnail
Thumbnail

Este ítem aparece na(s) seguinte(s) colección(s)

Mostrar o rexistro simple do ítem