Event

 
 

Spoken Document Retrieval and Browsing

Ciprian Chelba

Google

Tuesday, September 02, 2008
12:30

Ever increasing computing power and connectivity bandwidth together with falling storage costs result in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, search emerges as a key application as more and more data is being saved.

Speech search has not received much attention due to the fact that large collections of untranscribed spoken material have not been available, mostly due to storage constraints. As storage becomes cheaper, the availability and usefulness of large collections of spoken documents is limited strictly by the lack of adequate technology to exploit them. Manually transcribing speech is expensive and sometimes outright impossible due to privacy concerns. This leads us to exploring an automatic approach to searching and navigating spoken document collections.

The talk will focus on techniques for the indexing and retrieval of spoken audio files, and results on a corpus (MIT iCampus) containing recorded academic lectures.

 
Copyright © 2005 International Computer Science Institute. All Rights Reserved.