On Audio-Visual Information for Speech

Hang Su

ICSI

Tuesday, September 22, 2015
12:30 p.m., Conference Room 5A

This talk summarizes the work done during my internship in MSR this summer, focusing on utilizing visual features for speech recognition and other applications. On a real-life tech-talk dataset, we observed minor improvement combining visual feature under noisy condition. We further conduct research on detecting audio-visual synchrony, and achieved an accuracy of 88.2 percent on utterance level. This talk will also comment on a related work published in last Interspeech conference.