Use this link to cite:
http://hdl.handle.net/2183/41982 On-Device Automatic Speech Recognition for Low-Resource Languages in Mixed Reality Industrial Metaverse Applications: Practical Guidelines and Evaluation of a Shipbuilding Application in Galician
Loading...
Identifiers
Publication date
Authors
Advisors
Other responsabilities
Journal Title
Bibliographic citation
A. Valladares-Poncela, P. Fraga-Lamas and T. M. Fernández-Caramés, "On-Device Automatic Speech Recognition for Low-Resource Languages in Mixed Reality Industrial Metaverse Applications: Practical Guidelines and Evaluation of a Shipbuilding Application in Galician," in IEEE Access, vol. 13, pp. 77017-77038, 2025, doi: 10.1109/ACCESS.2025.3564137
Type of academic work
Academic degree
Abstract
[Abstract]: As the Metaverse and Mixed Reality (MR) technologies continue to evolve, enabling natural and intuitive user interfaces is crucial. However, supporting low-resource languages in these advanced systems presents unique challenges. This article explores the development and deployment of an on-device Automatic Speech Recognition (ASR) system for Galician, a low-resource language spoken by less than 3 million people, implemented on the Microsoft HoloLens 2 MR glasses. The system prioritizes data privacy and security by eliminating the need for Internet connectivity or external processing. Key implementation choices, including software and libraries, are detailed, along with optimization strategies for minimizing latency. Performance evaluations, taking into account noise-simulated environments, demonstrate the high accuracy and low latency of the system, proving its effectiveness as an on-device ASR system for current and future Metaverse applications. In order to demonstrate the effectiveness of the developed system, it has been incorporated in an electrical outfitting application for Navantia, one of the largest shipbuilding companies in the world, illustrating its practical utility in an industrial scenario like a shipyard. The results obtained show a Character Error Rate (CER) below 6% and a latency of under 3 seconds using an ARM64 quantized model, which validates the effectiveness of the system for real-time voice control in industrial MR environments.
Description
Editor version
Rights
Atribución 4.0 Internacional








