Generating Natural-Language Video Descriptions Using Text-Mined Knowledge
Title | Generating Natural-Language Video Descriptions Using Text-Mined Knowledge |
Publication Type | Conference Paper |
Year of Publication | 2013 |
Authors | Krishnamoorthy, N., Malkarnenkar G., Mooney R., Saenko K., & Guadarrama S. |
Other Numbers | 3445 |
Abstract | We present a holistic data-driven technique that generates natural-language descriptions for videos. We combine the output of state-of-the-art object and activity detectors with "real-world" knowledge to select the most probable subject-verb-object triplet for describing a video. We show that this knowledge, automatically mined from web-scale text corpora, enhances the triplet selection algorithm by providing it contextual information and leads to a four-fold increase in activity identification. Unlike previous methods, our approach can annotate arbitrary videos without requiring the expensive collection and annotation of a similar training video corpus. We evaluate our technique against a baseline that does not use text-mined knowledge and show that humans prefer our descriptions 61 percent of the time. |
Acknowledgment | This work was partially supported by funding provided to ICSI by the U.S. Defense Advanced Research Projects Agency (DARPA). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of DARPA or of the U.S. Government. |
URL | https://www.icsi.berkeley.edu/pubs/vision/generatingnatural13.pdf |
Bibliographic Notes | Proceedings of the 27th AAAI Conference on Artificial Intelligence (AAAI-13), Bellevue, Washington |
Abbreviated Authors | N. Krishnamoorthy, G. Malkarnenkar, R. J. Mooney, K. Saenko, and S. Guadarrama |
ICSI Research Group | Vision |
ICSI Publication Type | Article in conference proceedings |