ICSI

The ICSI Meeting corpus is a collection of 75 meetings collected at the International Computer Science Institute (ICSI) in Berkeley during the years 2000-2002 and released under the license CC-BY-4.0. The meetings included are "natural" meetings in the sense that they would have occurred anyway: they are generally regular weekly meetings of various ICSI working teams, including the team working on the ICSI Meeting Project. The dataset includes the English audio, as well as transcripts and summaries written by humans. In the textual summarization task the audio portion of the dataset is not used.

The dataset is a split of 6 meetings extracted by the Meetween project partner Zoom.

 A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, 2003, The ICSI Meeting Corpus, 2003 Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '03), Hong Kong, China.

Speech Sumarization

Language en
en

Sumarization

Language en
en