Anytime Recognition of Objects and Scenes

Humans are capable of perceiving a scene at a glance,and obtain deeper understanding with additional time. Similarly,visual recognition deployments should be robust tovarying computational budgets. Such situations requireAnytime recognition ability, which is rarely considered incomputer vision research. We present a method for learningdynamic policies to optimize Anytime performance invisual architectures. Our model sequentially orders featurecomputation and performs subsequent classification. Crucially,decisions are made at test time and depend on observeddata and intermediate results. We show the applicabilityof this system to standard problems in scene and objectrecognition. On suitable datasets, we can incorporatea semantic back-off strategy that gives maximally specificpredictions for a desired level of accuracy; this provides anew view on the time course of human visual perception.


