On-Device Automatic Speech Recognition for Low-Resource Languages in Mixed Reality Industrial Metaverse Applications: Practical Guidelines and Evaluation of a Shipbuilding Application in Galician

Valladares Poncela, Antón; Fraga-Lamas, Paula; Fernández-Caramés, Tiago M.

Use this link to cite:

http://hdl.handle.net/2183/41982

On-Device Automatic Speech Recognition for Low-Resource Languages in Mixed Reality Industrial Metaverse Applications: Practical Guidelines and Evaluation of a Shipbuilding Application in Galician

Files

Valladares_Poncela_Anton_2025_On-Device_Automatic_Speech_Recognition_for_Low-Resource_Languages_in_Mixed_Reality_Industrial_Metaverse_Applications.pdf (2.42 MB)

Identifiers

URI: http://hdl.handle.net/2183/41982

DOI: 10.1109/ACCESS.2025.3564137

Publication date

2025-04

Authors

Valladares Poncela, Antón

Fraga-Lamas, Paula

Fernández-Caramés, Tiago M.

Bibliographic citation

A. Valladares-Poncela, P. Fraga-Lamas and T. M. Fernández-Caramés, "On-Device Automatic Speech Recognition for Low-Resource Languages in Mixed Reality Industrial Metaverse Applications: Practical Guidelines and Evaluation of a Shipbuilding Application in Galician," in IEEE Access, vol. 13, pp. 77017-77038, 2025, doi: 10.1109/ACCESS.2025.3564137

Abstract

[Abstract]: As the Metaverse and Mixed Reality (MR) technologies continue to evolve, enabling natural and intuitive user interfaces is crucial. However, supporting low-resource languages in these advanced systems presents unique challenges. This article explores the development and deployment of an on-device Automatic Speech Recognition (ASR) system for Galician, a low-resource language spoken by less than 3 million people, implemented on the Microsoft HoloLens 2 MR glasses. The system prioritizes data privacy and security by eliminating the need for Internet connectivity or external processing. Key implementation choices, including software and libraries, are detailed, along with optimization strategies for minimizing latency. Performance evaluations, taking into account noise-simulated environments, demonstrate the high accuracy and low latency of the system, proving its effectiveness as an on-device ASR system for current and future Metaverse applications. In order to demonstrate the effectiveness of the developed system, it has been incorporated in an electrical outfitting application for Navantia, one of the largest shipbuilding companies in the world, illustrating its practical utility in an industrial scenario like a shipyard. The results obtained show a Character Error Rate (CER) below 6% and a latency of under 3 seconds using an ARM64 quantized model, which validates the effectiveness of the system for real-time voice control in industrial MR environments.