Automatic Speech Recognition
Automatic Speech Recognition (ASR) translates spoken language into text, helping users interact with devices hands-free.
ASR models analyze audio, convert it into phonetic sequences, and map those to words using linguistic models.
Early systems used Hidden Markov Models and Gaussian Mixture Models. Modern ASR employs deep learning architectures, such as recurrent neural networks (RNNs) and transformers, which deliver higher accuracy in real-world environments.
ASR systems face challenges like background noise, speaker variability, and different accents.
Advances in noise reduction and speaker adaptation are making these systems more robust for various applications.
ASR is widely used in virtual assistants, transcription tools, call centers, and accessibility solutions, including automatic captioning for videos and events.
Ongoing research explores multilingual ASR and end-to-end systems that process raw audio directly, expanding the scope of ASR in diverse applications.
ACL6060 test set
Collection of ACL 2022 paper presentations for which pre-recorded audio or
video presentations were provided to the ACL Anthology.
Presentations include a variety of native and non-native English accents.
Presentations have been professionally transcribed and translated into ten
language pairs, including 4 European languages (German, Portuguese, Dutch,
and French). The dataset was described in detail in “Elizabeth Salesky,
Kareem Darwish, Mohamed Al-Badrashiny, Mona Diab, and Jan Niehues”, 2023,
Evaluating Multilingual Speech Translation under Realistic Conditions with
Resegmentation and Terminology, in Proceedings of the 20th International
Conference on Spoken Language Translation (IWSLT 2023), pages 62-78,
Toronto, Canada, Association for Computational Linguistics publication.
Elizabeth Salesky, Kareem Darwish, Mohamed Al-Badrashiny, Mona Diab,
Jan Niehues”, 2023, Evaluating Multilingual Speech Translation under
Realistic Conditions with Resegmentation and Terminology, in Proceedings of
the 20th International Conference on Spoken Language Translation
(IWSLT 2023), pages 62-78, Toronto, Canada, Association for Computational
Linguistics.
COVOST test set
TODO: please update test set description