What You Hear Is What You Get: Audio-Based Video Content Analysis

TitleWhat You Hear Is What You Get: Audio-Based Video Content Analysis
Publication TypeConference Paper
Year of Publication2013
AuthorsElizalde, B. Martinez, Friedland G., & Ni K.
Other Numbers3610

Audio-based video event detection on user-generated content (UGC) aims to findvideos that show an observable event, such as a wedding ceremony or a birthdayparty. In a lower tier, audio concept detection aims to find a sound or concept,such as music, clapping or a cat’s meow. Different events are described by differentsounds. The difficulty of video content analysis on UGC lies in the lackof structure and acoustic variability of the data. The video content analysis hasbeen explored mainly by computer vision, but it requires audio to complementthe search of cues on this multimedia challenge. This paper presents an approachfor each detection task. First, an i-vector system for audio-based video event detection.The system compensates for undesired acoustic variability and extractsinformation from the acoustic environment of the event recordings, making it ameaningful choice for event detection on UGC. Second, an audio concept ranking-basedneural network system that aids to determine and select the most relevantconcepts for each event, to discard meaningless concepts, and to combine ambiguoussounds to enhance a concept.


This work was partially supported by funding provided to ICSI by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20066. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusion contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsement, either expressed or implied, of IARPA, DOI/NBC, or the U.S. Government.

Bibliographic Notes

Proceedings of the Bay Area Machine Learning Symposium 2013 (BayLearn), Menlo Park, California

Abbreviated Authors

B. Elizalde, G. Friedland, and K. Ni

ICSI Research Group

Audio and Multimedia

ICSI Publication Type

Article in conference proceedings