Dataset and Evaluation of Automatic Speech Recognition for Multi-lingual Intent Recognition on Social Robots

Antonio Andriella, Raquel Ros, Yoav Ellinson, Sharon Gannot, Séverin Lemaignan

Producció científica: Capítol de llibreContribució a congrés/conferènciaAvaluat per experts

Resum

While Automatic Speech Recognition (ASR) systems excel in controlled environments, challenges arise in robot-specific setups due to unique microphone requirements and added noise sources. In this paper, we create a dataset of initiating conversations with brief exchanges in 5 European languages, and we systematically evaluate current state-of-art ASR systems (Vosk, OpenWhisper, Google Speech and NVidia Riva). Besides standard metrics, we also look at two critical downstream tasks for human-robot verbal interaction: intent recognition rate and entity extraction, using the open-source Rasa chatbot. Overall, we found that open-source solutions as Vosk performs competitively with closed-source solutions while running on the edge, on a low compute budget (CPU only).

Idioma originalAnglès
Títol de la publicacióHRI 2024 - Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction
EditorIEEE Computer Society
Pàgines865-869
Nombre de pàgines5
ISBN (electrònic)9798400703225
DOIs
Estat de la publicacióPublicada - 11 de març 2024
Publicat externament
Esdeveniment19th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2024 - Boulder, United States
Durada: 11 de març 202415 de març 2024

Sèrie de publicacions

NomACM/IEEE International Conference on Human-Robot Interaction
ISSN (electrònic)2167-2148

Conferència

Conferència19th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2024
País/TerritoriUnited States
CiutatBoulder
Període11/03/2415/03/24

Fingerprint

Navegar pels temes de recerca de 'Dataset and Evaluation of Automatic Speech Recognition for Multi-lingual Intent Recognition on Social Robots'. Junts formen un fingerprint únic.

Com citar-ho