Offline
Throughout the years, we have been tracking the progress of cascaded solutions and end-to-end approaches on a variety of settings, including diverse languages, domains, speaking styles and recording conditions. We would continue this tradition and challenge the communities on their SLT solutions, including those using LLMs, with our evaluation framework.
The SLT system’s performance will be evaluated with respect to its capability to produce translations similar to the target-language references. Such similarity will be measured in terms of multiple automatic metrics: COMET, BLEURT, BLEU, TER, and characTER. The submitted runs will be ranked based on the COMET calculated on the test set by using automatic resegmentation of the hypothesis based on the reference translation by mwerSegmenter. The detailed evaluation script can be found in the SLT.KIT. Moreover, a human evaluation will be performed on each participant’s best-performing submission.
CHALLENGEACCENT test set
TODO: please update test set description
IWSLT25INSTRUCT test set
The IWSLT25Instruct test set consists of audio recordings from the scientific domain, specifically presentations of research papers at major NLP conferences within the *ACL community. These recordings feature one of the authors presenting their paper’s scientific content in English.
BUSINESSNEWS test set
TODO: please update test set description
Offline
| Language |
|---|
TVSERIES test set
The TVSERIES testset is part of ITV Plc, which includes the UK’s largest commercial broadcaster. They create and produce a broad range of programming (drama, entertainment, factual) in 13 countries, which they distribute globally, providing high-quality subtitles. We would like to thank ITV Studios for providing IWLST with samples of their video content for research and evaluation purposes and would like to ask you not to use these videos and/or the accompanying subtitles for any commercial purposes or to make them publicly available on any other website.