Speech Question-Anwering
Speech Question-Answering (SQA) allows users to ask questions using spoken
language and receive responses instantly. This task combines ASR to convert
speech to text, natural language understanding (NLU) to identify the query, and
question-answering models to retrieve the relevant information.
Handling spoken inputs introduces challenges such as disfluencies, accents, and
background noise. Improvements in ASR and NLP are essential to ensure accurate
and meaningful responses.
SQA powers voice assistants, automated customer support, and interactive kiosks,
offering hands-free and accessible user interactions. It enhances usability by
enabling natural conversations between users and systems.
Future developments may focus on multi-turn dialogues, where the system can
maintain context over multiple exchanges to deliver more coherent responses.
SPOKENSQUAD test set
Spoken-SQuAD is a spoken question answering dataset built on top of the SQuAD dataset and released under the license CC-BY-SA-4.0. In Spoken-SQuAD, the document is in spoken form, the input question is in the form of text and the answer to each question is always a span in the document. The spoken documents were generated from SQuAD textual articles using a Google text-to-speech system. In addition, corresponding automatic transcripts were generated using CMU Sphinx. The questions were left in text form. The SQuAD training set was used to generate the training set of Spoken-SQuAD, and the SQuAD development set was used to generate the testing set for Spoken-SQuAD. All the question-answer pairs for which the answer did not exist in the ASR transcriptions of the associated article were removed. The dataset is the dev split.
Chia-Hsuan Li,Szu-Lin Wu,Chi-Liang Liu,Hung-yi Lee, 2018, Spoken SQuAD: A Study of Mitigating the Impact of Speech Recognition Errors on Listening Comprehension, Proceedings of Interspeech 2018, Hyderabad, India