Speech Question-Anwering
Speech Question-Answering (SQA) allows users to ask questions using spoken language and receive responses instantly.
This task combines ASR to convert speech to text, natural language understanding (NLU) to identify the query, and question-answering models to retrieve the relevant information.
Handling spoken inputs introduces challenges such as disfluencies, accents, and background noise.
Improvements in ASR and NLP are essential to ensure accurate and meaningful responses.
SQA powers voice assistants, automated customer support, and interactive kiosks, offering hands-free and accessible user interactions.
It enhances usability by enabling natural conversations between users and systems.
Future developments may focus on multi-turn dialogues, where the system can maintain context over multiple exchanges to deliver more coherent responses.
SPOKENSQUAD test set
Spoken-SQuAD is a spoken question answering dataset built on top of the
SQuAD dataset and released under the license CC-BY-SA-4.0. In Spoken-SQuAD,
the document is in spoken form, the input question is in the form of text
and the answer to each question is always a span in the document. The spoken
documents were generated from SQuAD textual articles using a Google
text-to-speech system. In addition, corresponding automatic transcripts were
generated using CMU Sphinx. The questions were left in text form. The SQuAD
training set was used to generate the training set of Spoken-SQuAD, and the
SQuAD development set was used to generate the testing set for Spoken-SQuAD.
All the question-answer pairs for which the answer did not exist in the ASR
transcriptions of the associated article were removed. The dataset is the
dev split.
Chia-Hsuan Li,Szu-Lin Wu,Chi-Liang Liu,Hung-yi Lee, 2018, Spoken SQuAD: A
Study of Mitigating the Impact of Speech Recognition Errors on Listening
Comprehension, Proceedings of Interspeech 2018, Hyderabad, India