Speech Sumarization

Speech Summarization condenses spoken content into summaries, capturing main points without full transcription. This task can work with recordings from meetings, interviews, or lectures, saving users time by offering key insights. There are two main methods: extractive, which selects important segments from audio, and abstractive, which generates concise summaries. Some models build on ASR outputs, while others aim to summarize audio directly. Handling spontaneous speech with hesitations, repetitions, and interruptions poses a challenge. Robust systems must also account for background noise, accents, and speaker variability. Speech summarization finds applications in business, education, and media, helping users manage large volumes of audio data efficiently. Research is moving toward real-time speech summarization, allowing live meetings and events to be summarized on the fly.

ICSI test set

The ICSI Meeting corpus is a collection of 75 meetings collected at the International Computer Science Institute (ICSI) in Berkeley during the years 2000-2002 and released under the license CC-BY-4.0. The meetings included are "natural" meetings in the sense that they would have occurred anyway: they are generally regular weekly meetings of various ICSI working teams, including the team working on the ICSI Meeting Project. The dataset includes the English audio, as well as transcripts and summaries written by humans. In the textual summarization task the audio portion of the dataset is not used. The dataset is a split of 6 meetings extracted by the Meetween project partner Zoom. A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, 2003, The ICSI Meeting Corpus, 2003 Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), Hong Kong, China.

Speech Sumarization

Language	en
en