Speech Translation
Speech Translation combines automatic speech recognition (ASR), machine
translation (MT), and text-to-speech (TTS) to convert spoken language from one
language into another. It enables seamless communication between speakers of
different languages, often in real-time.
This task begins with ASR, which transcribes the spoken input into text. The
text is then translated into the target language using MT models. Finally, TTS
converts the translated text back into speech if spoken output is required.
Speech Translation presents unique challenges, including handling spontaneous
speech, regional accents, and background noise. The system must also preserve
the original meaning, cultural nuances, and tone of the speaker during
translation.
Speech translation is widely used in international conferences, live broadcasts,
travel applications, and multilingual customer service. It plays a critical role
in breaking down language barriers in real-time conversations.
Future research aims to enhance the quality of speech translation through end-to-
end models that bypass intermediate steps, enabling faster and more accurate
translations across a wider range of languages.
ACL6060 test set
Collection of ACL 2022 paper presentations for which pre-recorded audio or video presentations were provided to the ACL Anthology. Presentations include a variety of native and non-native English accents. Presentations have been professionally transcribed and translated into ten language pairs, including 4 European languages (German, Portuguese, Dutch, and French). The dataset was described in detail in “Elizabeth Salesky, Kareem Darwish, Mohamed Al-Badrashiny, Mona Diab, and Jan Niehues”, 2023, Evaluating Multilingual Speech Translation under Realistic Conditions with Resegmentation and Terminology, in Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 62–78, Toronto, Canada, Association for Computational Linguistics publication.
Elizabeth Salesky, Kareem Darwish, Mohamed Al-Badrashiny, Mona Diab, Jan Niehues”, 2023, Evaluating Multilingual Speech Translation under Realistic Conditions with Resegmentation and Terminology, in Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 62–78, Toronto, Canada, Association for Computational Linguistics.
MUSTC test set
MuST-C is a large and freely available Multilingual Speech Translation Corpus built from English TED Talks. Its unique features include: i) language coverage and diversity (from English into 14 languages from different families), ii) size (at least 237 hours of transcribed recordings per language, 430 on average), iii) variety of topics and speakers, and iv) data quality. The audio recordings from English TED Talks are automatically aligned at the sentence level with their manual transcriptions and translations. The MuST-C corpus is available to download for research purposes under a Creative Commons Attribution 4.0 International License. The dataset is the English component of the MuST-C v1.3 en-de, tst-COMMON set.
Roldano Cattoni, Mattia Antonino Di Gangi, Luisa Bentivogli, Matteo Negri, Marco Turchi. 2020, MuST-C: A multilingual corpus for end-to-end speech translation, In Computer Speech & Language Journal, Volume 66, March 2021
MTEDX test set
The corpus comprises audio recordings and transcripts from TEDx Talks in 8 languages, including 6 European languages (Spanish, French, Portuguese, Italian, Greek, and German), with translations into up to 5 languages, all European languages (English, Spanish, French, Portuguese, Italian). The audio recordings are automatically aligned at the sentence level with their manual transcripts and translations. The mTEDx dataset is available to download for research purposes under a Creative Commons Attribution 4.0 International License.
Elizabeth Salesky, Matthew Wiesner, Jacob Bremerman, Roldano Cattoni, Matteo Negri, Marco Turchi, Douglas W. Oard, Matt Post, 2021, Multilingual TEDx Corpus for Speech Recognition and Translation, Proceedings of Interspeech 2021, Brno, Czech Republic