Evaluating Pixel Language Models on Non-Standardized Languages
| UDC.coleccion | Investigación | es_ES |
| UDC.conferenceTitle | COLING 2025 - International Conference on Computational Linguistics | es_ES |
| UDC.departamento | Ciencias da Computación e Tecnoloxías da Información | es_ES |
| UDC.endPage | 6419 | es_ES |
| UDC.grupoInv | Lingua e Sociedade da Información (LYS) | es_ES |
| UDC.institutoCentro | CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación | es_ES |
| UDC.startPage | 6412 | es_ES |
| dc.contributor.author | Muñoz-Ortiz, Alberto | |
| dc.contributor.author | Blaschke, Verena | |
| dc.contributor.author | Plank, Barbara | |
| dc.date.accessioned | 2025-05-21T15:09:18Z | |
| dc.date.available | 2025-05-21T15:09:18Z | |
| dc.date.issued | 2025-01 | |
| dc.description | Trabajo presentado a: 31st International Conference on Computational Linguistics - COLING, January 19–24, 2025. | es_ES |
| dc.description.abstract | [Abstract]: We explore the potential of pixel-based models for transfer learning from standard languages to dialects. These models convert text into images that are divided into patches, enabling a continuous vocabulary representation that proves especially useful for out-of-vocabulary words common in dialectal data. Using German as a case study, we compare the performance of pixel-based models to token-based models across various syntactic and semantic tasks. Our results show that pixel-based models outperform token-based models in part-of-speech tagging, dependency parsing and intent detection for zero-shot dialect evaluation by up to 26 percentage points in some scenarios, though not in Standard German. However, pixel-based models fall short in topic classification. These findings emphasize the potential of pixel-based models for handling dialectal data, though further research should be conducted to assess their effectiveness in various linguistic contexts. | es_ES |
| dc.description.sponsorship | This work was funded by the European Research Council (ERC) Consolidator Grant DIALECT 101043235; SCANNER-UDC (PID2020-113230RB-C21) funded by MICIU/AEI/10.13039/501100011033; Xunta de Galicia (ED431C 2024/02); GAP (PID2022-139308OA-I00) funded by MICIU/AEI/10.13039/501100011033/ and by ERDF, EU; Grant PRE2021-097001 funded by MICIU/AEI/10.13039/501100011033 and by ESF+ (predoctoral training grant associated to project PID2020-113230RB-C21); LATCH- ING (PID2023-147129OB-C21) funded by MICIU/AEI/10.13039/501100011033 and ERDF; and Centro de Investigación de Galicia “CITIC”, funded by the Xunta de Galicia through the collaboration agreement between the Consellería de Cultura, Educación, Formación Profesional e Universidades and the Galician universities for the reinforcement of the research centres of the Galician University System (CIGUS). | es_ES |
| dc.description.sponsorship | Xunta de Galicia; ED431C 2024/02 | es_ES |
| dc.identifier.citation | Alberto Muñoz-Ortiz, Verena Blaschke, and Barbara Plank. 2025. Evaluating Pixel Language Models on Non-Standardized Languages. In Proceedings of the 31st International Conference on Computational Linguistics, pages 6412–6419, Abu Dhabi, UAE. Association for Computational Linguistics. https://aclanthology.org/2025.coling-main.427/ | es_ES |
| dc.identifier.isbn | 9798891761964 | |
| dc.identifier.issn | 2951-2093 | |
| dc.identifier.uri | http://hdl.handle.net/2183/42054 | |
| dc.language.iso | eng | es_ES |
| dc.publisher | Association for Computational Linguistics (ACL) | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PID2020-113230RB-C21/ES/MODELOS MULTITAREA DE ETIQUETADO SECUENCIAL PARA EL RECONOCIMIENTO DE ENTIDADES ENRIQUECIDO CON INFORMACIÓN LINGÜÍSTICA: SINTAXIS E INTEGRACIÓN MULTITAREA (SCANNER-UDC) | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2022-139308OA-100/ES/REPRESENTACIONES ESTRUCTURADAS VERDES Y ENCHUFABLES | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/PID2023-147129OB-C21/ES/TECNOLOGÍAS DEL LENGUAJE DESDE UNA PERSPECTIVA VERDE (LATCHING): DOMINIOS CON ESCASOS RECURSOS | es_ES |
| dc.relation.projectID | info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PRE2021-097001/ES/ | es_ES |
| dc.relation.uri | https://aclanthology.org/2025.coling-main.427/ | es_ES |
| dc.rights | Atribución 4.0 Internacional | es_ES |
| dc.rights | ©2025 Association for Computational Linguistics | es_ES |
| dc.rights.accessRights | open access | es_ES |
| dc.rights.uri | http://creativecommons.org/licenses/by/3.0/es/ | * |
| dc.subject | Computational linguistics | es_ES |
| dc.subject | Pixel-based models | es_ES |
| dc.subject | Computer aided language translation | es_ES |
| dc.subject | Contrastive learning | es_ES |
| dc.subject | Transfer learning | es_ES |
| dc.subject | Zero-shot learning | es_ES |
| dc.title | Evaluating Pixel Language Models on Non-Standardized Languages | es_ES |
| dc.type | conference output | es_ES |
| dc.type.hasVersion | VoR | es_ES |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | edf1cde8-d272-4a73-bdd3-9be2361b7651 | |
| relation.isAuthorOfPublication.latestForDiscovery | edf1cde8-d272-4a73-bdd3-9be2361b7651 |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- MunozOrtiz_Alberto_2025_Evaluating_Pixel_Language_Models.pdf
- Size:
- 238.67 KB
- Format:
- Adobe Portable Document Format
- Description:

