Robotic Vision

Principal Investigator(s): 
Trevor Darrell

To perform useful tasks in everyday human environments, robots must be able to both understand and communicate the sensations they experience during haptic interactions with objects. Toward this goal, vision researchers at ICSI augmented the Willow Garage PR2 robot with a pair of SynTouch BioTac sensors to capture rich tactile signals during the execution of four exploratory procedures on 60 household objects. In a parallel experiment, human subjects blindly touched the same objects and selected binary haptic adjectives from a predetermined set of 25 labels. The researchers developed several machine-learning algorithms to discover the meaning of each adjective from the robot’s sensory data. The most successful algorithms were those that intelligently combine static and dynamic components of the data recorded during all four exploratory procedures. The best of their approaches produced an average adjective classification F1 score of 0.77, a score higher than that of an average human subject.  This work is being expanded to perform visual assessment of haptic properties, using a joint visuo-haptic database.

Vision researchers developed a grasp affordance approach that combines image based category level detection methods with 3D point cloud data. Purely 3D data based grasp methods have been widely used for robotics task but. Most 3D grasp approaches lack the ability to grasp an object at a certain part, e.g., a cup on a handle only and not from inside. They are also hard to apply on flat objects on a table plane. Image only based methods, e.g., Deformable Parts Model showed promising results for object classification and pose estimation but are often too inaccurate to be applied on physical real world objects. The advantage of a combination of both 3D data with a 2D grasp point estimation pipeline enables us to grasp flat objects, to grasp certain parts of a 3D point cloud and to generalize over object instances. The researchers performed experiments using a set of household objects and will compare their combined method to a 2D and 3D base line approach.

They have also investigated grounding of spatial relations for Human-Robot Interaction. Natural language understanding is a key requirement to have humans and robots naturally interact with each other. However, in order to understand language, a robot needs to create the appropriate grounding between symbols in a sentence and the physical world as perceived by its sensors. Based on previous works that highlight the relevance of spatial relations in human-robot interactions and in the idea that models learnt automatically can lead to better performance, greater flexibility and adaptation capability, the researchers have developed a system that learns models for spatial prepositions and object recognition in order to understand statements that refer to objects (nouns), their spatial relationship (prepositions), and to execute different commands (verbs) upon request.