Where did I go Wrong?: Identifying Troublesome Segments for Speaker Diarization Systems

AuthorsKnox, M. Tai, Mirghafori N., & Friedland G.
The focus of this work is to identify types of segments that are difficult for speaker diarization systems. The diarization outputs of five state-of-the-art systems are analyzed on short/long segments as well as segments surrounding speaker changepoints. We found that for all five systems as the duration of the segment decreased the diarization error rate (DER) increased. Also, segments immediately preceding and following speaker changepoints performed much worse than their respective counterparts. In fact, at least 40% of the DER for all five systems is attributed to time within 0.5 seconds of a speaker changepoint. We hope the results of this work motivate future improvements of speaker diarization systems.


This work was partially supported by funding provided to ICSI through National Science Foundation grant OISE:1135365 (“International: An Analysis of Speaker Diarization Systems Errors”). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of the National Science Foundation.

Proceedings of the 13th Annual Conference of the International Speech Communication Association (InterSpeech 2012), Portland, Oregon

M. Knox, N. Mirghafori, and G. Friedland

