Multimodal Location Estimation

TitleMultimodal Location Estimation
Publication TypeTechnical Report
Year of Publication2010
AuthorsFriedland, G., Vinyals O., & Darrell T.
Other Numbers2926
Abstract

In this article we define a multimedia content analysis problem, which we call multimodal location estimation: Given a video/image/audio file, the task is to determine where it was recorded. A single indication, such as a unique landmark, might already pinpoint a location precisely. In most cases, however, a combination of evidence from the visual and the acoustic domain will only narrow down the set of possible answers. Therefore, approaches to tackle this task should be inherently multimedia. While the task is hard, in fact sometimes unsolvable, training data can be leveraged from the Internet in large amounts. Moreover, even partially successful automatic estimation of location opens up new possibilities in video content matching, archiving, and organization. It could revolutionize law enforcement and computer-aided intelligence agency work, especially since both semi-automatic and fully automatic approaches would be possible. In this article, we describe our idea of growing multimodal location estimation as a research field in the multimedia community. Based on examples and scenarios, we propose a multimedia approach to leverage cues from the visual and the acoustic portions of a video as well as from given metadata. We also describe experiments to estimate the amount of available training data that could potentially be used as publicly available infrastructure for research in this field. Finally, we present an initial set of results based on acoustic and visual cues and discuss the massive challenges involved and some possible paths to solutions.

Acknowledgment

This work was supported by funding provided by the National Geospatial?Intelligence Agency (NGA) through an NGA University Research Initiatives (NURI) grant. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of NGA.

URLhttp://www.icsi.berkeley.edu/pubs/techreports/TR-10-007.pdf
Bibliographic Notes

ICSI Technical Report TR-10-007

Abbreviated Authors

G. Friedland, O. Vinyals, and T. Darrell

ICSI Research Group

Vision

ICSI Publication Type

Technical Report