The ICSI RT-09 Speaker Diarization System

TitleThe ICSI RT-09 Speaker Diarization System
Publication TypeJournal Article
Year of Publication2012
AuthorsFriedland, G., Janin A., Imseng D., Anguera X., Gottlieb L., Huijbregts M., Knox M. Tai, & Vinyals O.
Published inIEEE Transactions on Audio
Volume20
Issue2
Page(s)371-381
Other Numbers3192
Abstract

The speaker diarization system developed at theInternational Computer Science Institute (ICSI) has playeda prominent role in the speaker diarization community, andmany researchers in the Rich Transcription community haveadopted methods and techniques developed for the ICSI speakerdiarization engine. Although there have been many relatedpublications over the years, previous articles only presentedchanges and improvements rather than a description of the fullsystem. Attempting to replicate the ICSI speaker diarizationsystem as a complete entity would require an extensive literaturereview, and might ultimately fail due to component descriptionversion mismatches. This article therefore presents the first fullconceptual description of the ICSI speaker diarization systemas presented to the National Institute of Standards TechnologyRich Transcription 2009 (NIST RT-09) evaluation, which consistsof online and offline subsystems, multi-stream and single-streamimplementations, and audio and audio-visual approaches. Someof the components, such as the online system, have not beenpreviously described. The article also includes all necessarypreprocessing steps, such as Wiener filtering, speech activitydetection and beamforming.Index Terms—Speaker Diarization, Machine Learning, GaussianMixture Models (GMM)

Acknowledgment

This work was sponsored by the Swiss NSF throughthe National Center of Competence in Research (NCCR)on “Interactive Multimodal Information Management” (IM2,www.im2.ch) and the European Integrated Project on“Augmented Multiparty Interaction with Distance Access”(AMIDA, www.amidaproject.org).

URLhttp://www.icsi.berkeley.edu/pubs/speech/rt09speaker11.pdf
Bibliographic Notes

IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, Issue 2, pp. 371-381

Abbreviated Authors

G. Friedland, A. Janin, D. Imseng, X. Anguera, L. Gottlieb, M. Huijbregts, M. Knox, and O. Vinyals

ICSI Research Group

Speech

ICSI Publication Type

Article in journal or magazine