Automatic Speech Recognition

Automatic Speech Recognition (ASR) translates spoken language into text, helping users interact with devices hands-free. ASR models analyze audio, convert it into phonetic sequences, and map those to words using linguistic models. Early systems used Hidden Markov Models and Gaussian Mixture Models. Modern ASR employs deep learning architectures, such as recurrent neural networks (RNNs) and transformers, which deliver higher accuracy in real-world environments. ASR systems face challenges like background noise, speaker variability, and different accents. Advances in noise reduction and speaker adaptation are making these systems more robust for various applications. ASR is widely used in virtual assistants, transcription tools, call centers, and accessibility solutions, including automatic captioning for videos and events. Ongoing research explores multilingual ASR and end-to-end systems that process raw audio directly, expanding the scope of ASR in diverse applications.

ACL6060 test set

Collection of ACL 2022 paper presentations for which pre-recorded audio or video presentations were provided to the ACL Anthology. Presentations include a variety of native and non-native English accents. Presentations have been professionally transcribed and translated into ten language pairs, including 4 European languages (German, Portuguese, Dutch, and French). The dataset was described in detail in “Elizabeth Salesky, Kareem Darwish, Mohamed Al-Badrashiny, Mona Diab, and Jan Niehues”, 2023, Evaluating Multilingual Speech Translation under Realistic Conditions with Resegmentation and Terminology, in Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 62-78, Toronto, Canada, Association for Computational Linguistics publication. Elizabeth Salesky, Kareem Darwish, Mohamed Al-Badrashiny, Mona Diab, Jan Niehues”, 2023, Evaluating Multilingual Speech Translation under Realistic Conditions with Resegmentation and Terminology, in Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 62-78, Toronto, Canada, Association for Computational Linguistics.

Automatic Speech Recognition

Language en
en

COVOST test set

TODO: please update test set description

Automatic Speech Recognition

Language de es fr it
de
es
fr
it