The ICSI RT-09 Speaker Diarization System
Title | The ICSI RT-09 Speaker Diarization System |
Publication Type | Journal Article |
Year of Publication | 2012 |
Authors | Friedland, G., Janin A., Imseng D., Anguera X., Gottlieb L., Huijbregts M., Knox M. Tai, & Vinyals O. |
Published in | IEEE Transactions on Audio |
Volume | 20 |
Issue | 2 |
Page(s) | 371-381 |
Other Numbers | 3192 |
Abstract | The speaker diarization system developed at theInternational Computer Science Institute (ICSI) has playeda prominent role in the speaker diarization community, andmany researchers in the Rich Transcription community haveadopted methods and techniques developed for the ICSI speakerdiarization engine. Although there have been many relatedpublications over the years, previous articles only presentedchanges and improvements rather than a description of the fullsystem. Attempting to replicate the ICSI speaker diarizationsystem as a complete entity would require an extensive literaturereview, and might ultimately fail due to component descriptionversion mismatches. This article therefore presents the first fullconceptual description of the ICSI speaker diarization systemas presented to the National Institute of Standards TechnologyRich Transcription 2009 (NIST RT-09) evaluation, which consistsof online and offline subsystems, multi-stream and single-streamimplementations, and audio and audio-visual approaches. Someof the components, such as the online system, have not beenpreviously described. The article also includes all necessarypreprocessing steps, such as Wiener filtering, speech activitydetection and beamforming.Index TermsSpeaker Diarization, Machine Learning, GaussianMixture Models (GMM) |
Acknowledgment | This work was sponsored by the Swiss NSF throughthe National Center of Competence in Research (NCCR)on Interactive Multimodal Information Management (IM2,www.im2.ch) and the European Integrated Project onAugmented Multiparty Interaction with Distance Access(AMIDA, www.amidaproject.org). |
URL | http://www.icsi.berkeley.edu/pubs/speech/rt09speaker11.pdf |
Bibliographic Notes | IEEE Transactions on Audio, Speech, and Language Processing, Vol. 20, Issue 2, pp. 371-381 |
Abbreviated Authors | G. Friedland, A. Janin, D. Imseng, X. Anguera, L. Gottlieb, M. Huijbregts, M. Knox, and O. Vinyals |
ICSI Research Group | Speech |
ICSI Publication Type | Article in journal or magazine |