Speech Translation

Speech Translation combines automatic speech recognition (ASR), machine translation (MT), and text-to-speech (TTS) to convert spoken language from one language into another. It enables seamless communication between speakers of different languages, often in real-time. This task begins with ASR, which transcribes the spoken input into text. The text is then translated into the target language using MT models. Finally, TTS converts the translated text back into speech if spoken output is required. Speech Translation presents unique challenges, including handling spontaneous speech, regional accents, and background noise. The system must also preserve the original meaning, cultural nuances, and tone of the speaker during translation. Speech translation is widely used in international conferences, live broadcasts, travel applications, and multilingual customer service. It plays a critical role in breaking down language barriers in real-time conversations. Future research aims to enhance the quality of speech translation through end-to- end models that bypass intermediate steps, enabling faster and more accurate translations across a wider range of languages.

ACL6060 test set

Collection of ACL 2022 paper presentations for which pre-recorded audio or video presentations were provided to the ACL Anthology. Presentations include a variety of native and non-native English accents. Presentations have been professionally transcribed and translated into ten language pairs, including 4 European languages (German, Portuguese, Dutch, and French). The dataset was described in detail in “Elizabeth Salesky, Kareem Darwish, Mohamed Al-Badrashiny, Mona Diab, and Jan Niehues”, 2023, Evaluating Multilingual Speech Translation under Realistic Conditions with Resegmentation and Terminology, in Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 62–78, Toronto, Canada, Association for Computational Linguistics publication.

Elizabeth Salesky, Kareem Darwish, Mohamed Al-Badrashiny, Mona Diab, Jan Niehues”, 2023, Evaluating Multilingual Speech Translation under Realistic Conditions with Resegmentation and Terminology, in Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 62–78, Toronto, Canada, Association for Computational Linguistics.

Speech Translation

Language	en
de
fr
nl
pt

MUSTC test set

MuST-C is a large and freely available Multilingual Speech Translation Corpus built from English TED Talks. Its unique features include: i) language coverage and diversity (from English into 14 languages from different families), ii) size (at least 237 hours of transcribed recordings per language, 430 on average), iii) variety of topics and speakers, and iv) data quality. The audio recordings from English TED Talks are automatically aligned at the sentence level with their manual transcriptions and translations. The MuST-C corpus is available to download for research purposes under a Creative Commons Attribution 4.0 International License. The dataset is the English component of the MuST-C v1.3 en-de, tst-COMMON set.

Roldano Cattoni, Mattia Antonino Di Gangi, Luisa Bentivogli, Matteo Negri, Marco Turchi. 2020, MuST-C: A multilingual corpus for end-to-end speech translation, In Computer Speech & Language Journal, Volume 66, March 2021

Speech Translation

Language	en
cs
de
es
fr
it
nl
pt
ro

MTEDX test set

The corpus comprises audio recordings and transcripts from TEDx Talks in 8 languages, including 6 European languages (Spanish, French, Portuguese, Italian, Greek, and German), with translations into up to 5 languages, all European languages (English, Spanish, French, Portuguese, Italian). The audio recordings are automatically aligned at the sentence level with their manual transcripts and translations. The mTEDx dataset is available to download for research purposes under a Creative Commons Attribution 4.0 International License.

Elizabeth Salesky, Matthew Wiesner, Jacob Bremerman, Roldano Cattoni, Matteo Negri, Marco Turchi, Douglas W. Oard, Matt Post, 2021, Multilingual TEDx Corpus for Speech Recognition and Translation, Proceedings of Interspeech 2021, Brno, Czech Republic

Speech Translation

Language	el	es	fr	it	pt
en