Speech Sumarization
Speech Summarization condenses spoken content into summaries, capturing main
points without full transcription. This task can work with recordings from
meetings, interviews, or lectures, saving users time by offering key insights.
There are two main methods: extractive, which selects important segments from
audio, and abstractive, which generates concise summaries. Some models build on
ASR outputs, while others aim to summarize audio directly.
Handling spontaneous speech with hesitations, repetitions, and interruptions
poses a challenge. Robust systems must also account for background noise,
accents, and speaker variability.
Speech summarization finds applications in business, education, and media,
helping users manage large volumes of audio data efficiently.
Research is moving toward real-time speech summarization, allowing live meetings
and events to be summarized on the fly.
ICSI test set
The ICSI Meeting corpus is a collection of 75 meetings collected at the International Computer Science Institute (ICSI) in Berkeley during the years 2000-2002 and released under the license CC-BY-4.0. The meetings included are "natural" meetings in the sense that they would have occurred anyway: they are generally regular weekly meetings of various ICSI working teams, including the team working on the ICSI Meeting Project. The dataset includes the English audio, as well as transcripts and summaries written by humans. In the textual summarization task the audio portion of the dataset is not used.
The dataset is a split of 6 meetings extracted by the Meetween project partner Zoom.
The dataset is a split of 6 meetings extracted by the Meetween project partner Zoom.
A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, 2003, The ICSI Meeting Corpus, 2003 Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), Hong Kong, China.