Speech Sumarization
Speech Summarization condenses spoken content into summaries, capturing main points without full transcription.
This task can work with recordings from meetings, interviews, or lectures, saving users time by offering key insights.
There are two main methods: extractive, which selects important segments from audio, and abstractive, which generates concise summaries.
Some models build on ASR outputs, while others aim to summarize audio directly.
Handling spontaneous speech with hesitations, repetitions, and interruptions poses a challenge.
Robust systems must also account for background noise, accents, and speaker variability.
Speech summarization finds applications in business, education, and media, helping users manage large volumes of audio data efficiently.
Research is moving toward real-time speech summarization, allowing live meetings and events to be summarized on the fly.
ICSI test set
The ICSI Meeting corpus is a collection of 75 meetings collected at the
International Computer Science Institute (ICSI) in Berkeley during the years
2000-2002 and released under the license CC-BY-4.0. The meetings included
are "natural" meetings in the sense that they would have occurred anyway:
they are generally regular weekly meetings of various ICSI working teams,
including the team working on the ICSI Meeting Project. The dataset includes
the English audio, as well as transcripts and summaries written by humans.
In the textual summarization task the audio portion of the dataset is not
used.
The dataset is a split of 6 meetings extracted by the Meetween project
partner Zoom.
A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, 2003,
The ICSI Meeting Corpus, 2003 Proceedings of the IEEE International
Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), Hong
Kong, China.