Multimodal City-Verification on Flickr Videos Using Acoustic and Textual Features
Title | Multimodal City-Verification on Flickr Videos Using Acoustic and Textual Features |
Publication Type | Conference Paper |
Year of Publication | 2012 |
Authors | Lei, H., Choi J., & Friedland G. |
Page(s) | 2273-2276 |
Other Numbers | 3238 |
Abstract | We have performed city-verification of videos based on the videos' audio and metadata, using videos from the MediaEval Placing Task's video set, which contain consumer-produced videos from-the-wild. 18 cities were used as targets, for which acoustic and language models were trained, and against which test videos were scored. We have obtained the first known results for the city verification task, with an EER minimum of 21.8%, suggesting that ~80% of test videos, when tested against a correct target city, were identified as belonging to that city. This result is well above-chance, even as the videos contained very few city-specific audio and metadata features. We have also demonstrated the complementarity of audio and metadata for this task. |
Acknowledgment | This work was partially supported by funding provided to ICSI through National Science Foundation grants IIS-1138599 (EAGER: Collecting Training Videos for Location Estimation with Mechanical Turk) and CNS-1065240 ("Understanding and Managing the Impact of Global Inference on Online Privacy"). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of the National Science Foundation. This work was also partially supported by funding provided through the National Geospatial-Intelligence Agency University Research Initiatives program (NGA NURI #HM11582-10-1-0008). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of NGA. |
URL | http://www.icsi.berkeley.edu/pubs/speech/cityverification12.pdf |
Bibliographic Notes | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), pp. 2273-2276, Kyoto, Japan |
Abbreviated Authors | H. Lei, J. Choi, and G. Friedland |
ICSI Research Group | Audio and Multimedia |
ICSI Publication Type | Article in conference proceedings |