Evaluating Pixel Language Models on Non-Standardized Languages

Muñoz-Ortiz, Alberto; Blaschke, Verena; Plank, Barbara

Use this link to cite:

http://hdl.handle.net/2183/42054

Evaluating Pixel Language Models on Non-Standardized Languages

Files

MunozOrtiz_Alberto_2025_Evaluating_Pixel_Language_Models.pdf (238.67 KB)

Identifiers

URI: http://hdl.handle.net/2183/42054

Publication date

2025-01

Authors

Muñoz-Ortiz, Alberto

Blaschke, Verena

Plank, Barbara

Bibliographic citation

Alberto Muñoz-Ortiz, Verena Blaschke, and Barbara Plank. 2025. Evaluating Pixel Language Models on Non-Standardized Languages. In Proceedings of the 31st International Conference on Computational Linguistics, pages 6412–6419, Abu Dhabi, UAE. Association for Computational Linguistics. https://aclanthology.org/2025.coling-main.427/

Abstract

[Abstract]: We explore the potential of pixel-based models for transfer learning from standard languages to dialects. These models convert text into images that are divided into patches, enabling a continuous vocabulary representation that proves especially useful for out-of-vocabulary words common in dialectal data. Using German as a case study, we compare the performance of pixel-based models to token-based models across various syntactic and semantic tasks. Our results show that pixel-based models outperform token-based models in part-of-speech tagging, dependency parsing and intent detection for zero-shot dialect evaluation by up to 26 percentage points in some scenarios, though not in Standard German. However, pixel-based models fall short in topic classification. These findings emphasize the potential of pixel-based models for handling dialectal data, though further research should be conducted to assess their effectiveness in various linguistic contexts.

Description

Trabajo presentado a: 31st International Conference on Computational Linguistics - COLING, January 19–24, 2025.

Keywords

Computational linguistics Pixel-based models Computer aided language translation Contrastive learning Transfer learning Zero-shot learning

Editor version

https://aclanthology.org/2025.coling-main.427/

Rights

Collections

Investigación (FFIL)

Full item page

Except where otherwise noted, this item's license is described as Atribución 4.0 Internacional

Evaluating Pixel Language Models on Non-Standardized Languages

Files

Identifiers

Publication date

Authors

Advisors

Other responsabilities

Journal Title

Bibliographic citation

Type of academic work

Academic degree

Abstract

Description

Keywords

Editor version

Rights

Collections