Dataset and Evaluation of Automatic Speech Recognition for Multi-lingual Intent Recognition on Social Robots

Antonio Andriella, Raquel Ros, Yoav Ellinson, Sharon Gannot, Séverin Lemaignan

Research output: Book chapterConference contributionpeer-review

Abstract

While Automatic Speech Recognition (ASR) systems excel in controlled environments, challenges arise in robot-specific setups due to unique microphone requirements and added noise sources. In this paper, we create a dataset of initiating conversations with brief exchanges in 5 European languages, and we systematically evaluate current state-of-art ASR systems (Vosk, OpenWhisper, Google Speech and NVidia Riva). Besides standard metrics, we also look at two critical downstream tasks for human-robot verbal interaction: intent recognition rate and entity extraction, using the open-source Rasa chatbot. Overall, we found that open-source solutions as Vosk performs competitively with closed-source solutions while running on the edge, on a low compute budget (CPU only).

Original languageEnglish
Title of host publicationHRI 2024 - Proceedings of the 2024 ACM/IEEE International Conference on Human-Robot Interaction
PublisherIEEE Computer Society
Pages865-869
Number of pages5
ISBN (Electronic)9798400703225
DOIs
Publication statusPublished - 11 Mar 2024
Externally publishedYes
Event19th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2024 - Boulder, United States
Duration: 11 Mar 202415 Mar 2024

Publication series

NameACM/IEEE International Conference on Human-Robot Interaction
ISSN (Electronic)2167-2148

Conference

Conference19th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2024
Country/TerritoryUnited States
CityBoulder
Period11/03/2415/03/24

Keywords

  • Assistive Robotics
  • Audio Dataset
  • Automatic Speech Recognition
  • Human-Robot Interaction

Fingerprint

Dive into the research topics of 'Dataset and Evaluation of Automatic Speech Recognition for Multi-lingual Intent Recognition on Social Robots'. Together they form a unique fingerprint.

Cite this