City-Identification on Flickr Videos Using Acoustic Features
Title | City-Identification on Flickr Videos Using Acoustic Features |
Publication Type | Technical Report |
Year of Publication | 2011 |
Authors | Lei, H., Choi J., & Friedland G. |
Other Numbers | 3077 |
Abstract | This article presents an approach that utilizes audio to discriminate the city of origin of consumer-producedvideos a task that is hard to imagine even for humans. Using a sub-set of the MediaEvalPlacing Task's Flickr video set, we conducted an experiment with a setup similar to a typical NISTspeaker recognition evaluation run. Our assumption is that the audio within the same city might bematched in various ways, e.g., language, typical environmental acoustics, etc., without a singleoutstanding feature being absolutely indicative. Using the NIST speaker recognition framework, a set of18 cities across the world are used as targets, and Gaussian Mixture Models are trained on all targets.Audio from videos of a test set is scored against each of the targets, and a set of scores is obtained forpairs of test set files and target city models. The Equal Error Rate (EER), which is obtained at a scoringthreshold where the number of false alarms equals the misses, is used as the performance measure of |
Acknowledgment | This work was partially supported by funding provided through the National Geospatial-Intelligence Agency University Research Initiatives program (NGA NURI). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of NGA. |
URL | http://www.icsi.berkeley.edu/pubs/techreports/TR-11-001.pdf |
Bibliographic Notes | ICSI Technical Report TR-11-001 |
Abbreviated Authors | H. Lei, J. Choi, and G. Friedland |
ICSI Research Group | Audio and Multimedia |
ICSI Publication Type | Technical Report |