Open-Vocabulary Object Retrieval
Title | Open-Vocabulary Object Retrieval |
Publication Type | Conference Paper |
Year of Publication | 2014 |
Authors | Guadarrama, S., Rodner E., Saenko K., Zhang N., Farrell R., Donahue J., & Darrell T. |
Other Numbers | 3694 |
Abstract | n this paper, we address the problem of retrievingobjects based on open-vocabulary natural language queries:Given a phrase describing a specific object, e.g., the corn flakes box, the task is to find the best match in a set of imagescontaining candidate objects. When naming objects, humans tendto use natural language with rich semantics, including basic-levelcategories, fine-grained categories, and instance-level conceptssuch as brand names. Existing approaches to large-scale objectrecognition fail in this scenario, as they expect queries thatmap directly to a fixed set of pre-trained visual categories, e.g., ImageNet synset tags. We address this limitation by introducinga novel object retrieval method. Given a candidate object image,we first map it to a set of words that are likely to describe it,using several learned image-to-text projections. We also propose amethod for handling open-vocabularies, i.e., words not containedin the training data. We then compare the natural languagequery to the sets of words predicted for each candidate andselect the best match. Our method can combine category- andinstance-level semantics in a common representation. We presentextensive experimental results on several datasets using bothinstance-level and category-level matching and show that ourapproach can accurately retrieve objects based on extremelyvaried open-vocabulary queries. The source code of our approachwill be publicly available together with pre-trained models athttp://openvoc.berkeleyvision.organd could be directly used forrobotics applications. |
Bibliographic Notes | Proceedings of the 10th Annual Conference on Robotics: Science and Systems (RSS X), Berkeley, California |
Abbreviated Authors | S. Guadarrama, E. Rodner, K. Saenko, N. Zhang, R. Farrell, J. Donahue, and T. Darrell |
ICSI Research Group | Vision |
ICSI Publication Type | Article in conference proceedings |