Lost in Segmentation: Three Approaches for Speech/Non-Speech Detection in Consumer-Produced Videos
Title | Lost in Segmentation: Three Approaches for Speech/Non-Speech Detection in Consumer-Produced Videos |
Publication Type | Conference Paper |
Year of Publication | 2013 |
Authors | Elizalde, B. Martinez, & Friedland G. |
Other Numbers | 3420 |
Abstract | Traditional speech/non-speech segmentation systemshave been designed for specific acoustic conditions, suchas broadcast news or meetings. However, little researchhas been done on consumer-produced audio. This type ofmedia is constantly growing and has complex characteristicssuch as low quality recordings, environmental noise andoverlapping sounds. This paper discusses an evaluation ofthree different approaches for speech/non-speech detectionon consumer-produced audio. The approaches are state-ofthe-art speech/non-speech detectorsone based on GaussianMixture Models (GMM), another on Support Vector Machines(SVM), and the last on Neural Networks (NN). Usingthe TRECVID MED 2012 database, we designed training/testing sets combinations to aid the understanding of whatspeech/non-speech detection on consumer-produced mediaentails and how traditional approaches to this detection performedin this domain. The results revealed that the crossdomainstate-of-the-art GMM and SVM systems tests underperformed |
Acknowledgment | Supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20066. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusion contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsement, either expressed or implied, of IARPA, DOI/NBC, or the U.S. Government. |
URL | https://www.icsi.berkeley.edu/pubs/speech/losttranslation13.pdf |
Bibliographic Notes | Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2013), San Jose, California |
Abbreviated Authors | B. Elizalde and G. Friedland |
ICSI Research Group | Audio and Multimedia |
ICSI Publication Type | Article in conference proceedings |