On-Device Automatic Speech Recognition for IIoT and Extended Reality Industrial Metaverse Applications

Valladares Poncela, Antón; Fraga-Lamas, Paula; Fernández-Caramés, Tiago M.

On-Device Automatic Speech Recognition for IIoT and Extended Reality Industrial Metaverse Applications

UDC.coleccion	Investigación
UDC.conferenceTitle	ECSA 2024
UDC.departamento	Enxeñaría de Computadores
UDC.grupoInv	Grupo de Tecnoloxía Electrónica e Comunicacións (GTEC)
UDC.institutoCentro	CITIC - Centro de Investigación de Tecnoloxías da Información e da Comunicación
dc.contributor.author	Valladares Poncela, Antón
dc.contributor.author	Fraga-Lamas, Paula
dc.contributor.author	Fernández-Caramés, Tiago M.
dc.date.accessioned	2026-04-21T08:50:23Z
dc.date.available	2026-04-21T08:50:23Z
dc.date.issued	2024-11
dc.description	Presented at The 11th International Electronic Conference on Sensors and Applications (ECSA-11), 26–28 November 2024, Online; Available online: https://sciforum.net/event/ecsa-11.
dc.description.abstract	[Abstract]: This paper presents a comprehensive study on enhancing Industrial Internet of Things (IIoT) and Industrial Metaverse applications through the integration of On-Device Automatic Speech Recognition (ASR) using Microsoft HoloLens 2 smart glasses. Specifically, this paper focuses on the utilization of the HoloLens 2 microphone array and sound capture APIs to benchmark the performance and accuracy of on-device ASR models. The evaluation of these models includes metrics such as Character Error Rate (CER), Word Error Rate (WER) and latency. In addition, this paper explores various optimization techniques, including quantization tools and model refinement strategies, aimed at minimizing latency while maintaining high accuracy. This study also emphasizes the importance of supporting low-resource languages, using Galician—a language spoken by less than 3 million people worldwide—as a case study. By benchmarking different variations of a Wav2Vec2.0-based ASR model fine-tuned for Galician, the most effective models are identified, as well as their optimal runtime configurations. This work underscores the critical role of low-latency on-device ASR systems in real-time IIoT and Industrial Metaverse applications, highlighting how these technologies can enhance operational efficiency, privacy and user experience in industrial environments. The findings demonstrate the significant potential of the on-device ASR system developed to enhance voice interactions in emerging Metaverse applications, specially for low-resource languages.
dc.description.sponsorship	This work has been supported by Centro Mixto de Investigación UDC-NAVANTIA (IN853C 2022/01), funded by GAIN (Xunta de Galicia) and ERDF Galicia 2021-2027 and TED2021-129433A-C22 (HELENE) funded by MCIN/AEI/10.13039/501100011033 and the European Union NextGenerationEU/PRTR.
dc.description.sponsorship	Xunta de Galicia; IN853C 2022/01
dc.identifier.citation	Valladares-Poncela, A.; Fraga-Lamas, P.; Fernández-Caramés, T.M. On-Device Automatic Speech Recognition for IIoT and Extended Reality Industrial Metaverse Applications. Eng. Proc. 2024, 82, 3. https://doi.org/10.3390/ecsa-11-20466
dc.identifier.doi	10.3390/ecsa-11-20466
dc.identifier.issn	2673-4591
dc.identifier.uri	https://hdl.handle.net/2183/48050
dc.language.iso	eng
dc.publisher	MDPI
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023/TED2021-129433A-C22/ES/SISTEMA DE ALTA SEGURIDAD BASADO EN BLOCKCHAIN PARA LA GESTIÓN PRIVADA DE DATOS DE PACIENTES DE SERVICIOS DE SALUD DIGITALES
dc.relation.uri	https://doi.org/10.3390/ecsa-11-20466
dc.rights	Attribution 4.0 International	en
dc.rights.accessRights	open access
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.subject	Automatic Speech Recognition
dc.subject	ASR
dc.subject	Internet of Things
dc.subject	IIoT
dc.subject	Industrial Metaverse
dc.subject	Microsoft HoloLens 2
dc.subject	Extended Reality
dc.title	On-Device Automatic Speech Recognition for IIoT and Extended Reality Industrial Metaverse Applications
dc.type	conference output
dspace.entity.type	Publication
relation.isAuthorOfPublication	caa923d2-cf88-405e-9025-759d06cf3799
relation.isAuthorOfPublication	79dbfabd-7261-41ff-9667-2f774d5f341e
relation.isAuthorOfPublication.latestForDiscovery	caa923d2-cf88-405e-9025-759d06cf3799

Files

Original bundle

Now showing 1 - 1 of 1

Name:: FragaLamas_Paula_2024_On_Device_Automatic_Speech_Recognition.pdf
Size:: 5.12 MB
Format:: Adobe Portable Document Format

Download

Collections

Investigación (FIC)