On-Device Automatic Speech Recognition for IIoT and Extended Reality Industrial Metaverse Applications

Loading...
Thumbnail Image

Identifiers

Publication date

Authors

Advisors

Other responsabilities

Journal Title

Bibliographic citation

Valladares-Poncela, A.; Fraga-Lamas, P.; Fernández-Caramés, T.M. On-Device Automatic Speech Recognition for IIoT and Extended Reality Industrial Metaverse Applications. Eng. Proc. 2024, 82, 3. https://doi.org/10.3390/ecsa-11-20466

Type of academic work

Academic degree

Abstract

[Abstract]: This paper presents a comprehensive study on enhancing Industrial Internet of Things (IIoT) and Industrial Metaverse applications through the integration of On-Device Automatic Speech Recognition (ASR) using Microsoft HoloLens 2 smart glasses. Specifically, this paper focuses on the utilization of the HoloLens 2 microphone array and sound capture APIs to benchmark the performance and accuracy of on-device ASR models. The evaluation of these models includes metrics such as Character Error Rate (CER), Word Error Rate (WER) and latency. In addition, this paper explores various optimization techniques, including quantization tools and model refinement strategies, aimed at minimizing latency while maintaining high accuracy. This study also emphasizes the importance of supporting low-resource languages, using Galician—a language spoken by less than 3 million people worldwide—as a case study. By benchmarking different variations of a Wav2Vec2.0-based ASR model fine-tuned for Galician, the most effective models are identified, as well as their optimal runtime configurations. This work underscores the critical role of low-latency on-device ASR systems in real-time IIoT and Industrial Metaverse applications, highlighting how these technologies can enhance operational efficiency, privacy and user experience in industrial environments. The findings demonstrate the significant potential of the on-device ASR system developed to enhance voice interactions in emerging Metaverse applications, specially for low-resource languages.

Description

Presented at The 11th International Electronic Conference on Sensors and Applications (ECSA-11), 26–28 November 2024, Online; Available online: https://sciforum.net/event/ecsa-11.

Rights

Attribution 4.0 International
Attribution 4.0 International

Except where otherwise noted, this item's license is described as Attribution 4.0 International