Multimodal City-Identification on Flickr Videos Using Acoustic and Textual Features
Title | Multimodal City-Identification on Flickr Videos Using Acoustic and Textual Features |
Publication Type | Technical Report |
Year of Publication | 2012 |
Authors | Lei, H., Choi J., & Friedland G. |
Other Numbers | 3301 |
Abstract | We have performed city-verification of videos based on the videos' audio and metadata, using videos from the MediaEval Placing Task's video set, which contain consumer-produced videos from-the-wild. Eighteen cities were used as targets, for which acoustic and language models were trained, and against which test videos were scored. We have obtained the first known results for the city verification task, with an EER minimum of 21.8 percent. This result is well above-chance, even though the videos contain very few city-specific audio and metadata features. We have also demonstrated the complementarity of audio and metadata for this task. |
Acknowledgment | This research is supported by NGA NURI grant number HM11582-10-1-0008, NSF EAGER grant IIS-1138599, and NSF Award CNS-1065240. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsors. |
URL | http://www.icsi.berkeley.edu/pubs/techreports/TR-12-007.pdf |
Bibliographic Notes | ICSI Technical Report TR-12-007 |
Abbreviated Authors | H. Lei, J. Choi, and G. Friedland |
ICSI Research Group | Audio and Multimedia |
ICSI Publication Type | Technical Report |