Addressee Detection for Dialog Systems Using Temporal and Spectral Dimensions of Speaking Style

TitleAddressee Detection for Dialog Systems Using Temporal and Spectral Dimensions of Speaking Style
Publication TypeConference Paper
Year of Publication2013
AuthorsShriberg, E., Stolcke A., & Ravuri S.
Other Numbers3612
Abstract

s dialog systems evolve to handle unconstrained input and for use in open environments, addressee detection (detecting speech to the system versus to other people) becomes an increasingly important challenge. We study a corpus in which speakers talk both to a system and to each other, and model two dimensions of speaking style that talkers modify when changing addressee: speech rhythm and vocal effort. For each dimension we design features that do not require speech recognition output, session normalization, speaker normalization, or dialog context. Detection experiments show that rhythm and effort features are complementary, outperform lexical models based on recognized words, and reduce error rates even if word recognition is error-free. Simulated online processing experiments show that all features need only the first couple seconds of speech. Finally, we find that temporal and spectral stylistic models can be trained on outside corpora, such as ATIS and ICSI meetings, with reasonable generalization to the target task, thus showing promise for domain-independent computer-versus- human addressee detectors.

URLhttps://www.icsi.berkeley.edu/pubs/speech/addresseedetection13.pdf
Bibliographic Notes

Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), Lyon, France

Abbreviated Authors

E. Shriberg, A. Stolcke, and S. Ravuri

ICSI Research Group

Speech

ICSI Publication Type

Article in conference proceedings