| |
Speech/Speaker Segmentation Based on the Bayesian Information Criterion
Constantine Kotropoulos
Aristotle University of Thessaloniki
Tuesday, May 12, 2009
12:30
Speaker segmentation based on the Bayesian Information Criterion (BIC) is addressed. A new theoretical formulation of BIC is derived by applying centering and simultaneous diagonalization. This formulation is considerably more computationally efficient than the standard BIC, when the covariance matrices are estimated by other estimators (e.g. M estimators) than the usual maximum likelihood ones. A closely related problem is automatic phone segmentation based on model selection criteria. Speech segmentation at a phone level imposes high resolution requirements in the short-time analysis of the audio signal. Otherwise the limited information available in such a small scale would be too restrictive for an efficient characterization of the signal. We investigate the phone boundary detection efficiency of entropy- and Bayesian- based model selection criteria in continuous speech based on the DISTBIC hybrid segmentation algorithm. DISTBIC is a text-independent bottom-up approach that identifies sequential model changes by combining metric distances with statistical hypothesis testing. In order to alleviate this problem and detect the phone boundaries accurately, we employ an information criterion corrected for small samples while modelling speech samples with the generalised Gamma distribution, which offers a more efficient parametric characterisation of speech in the frequency domain than the Gaussian distribution. Using robust statistics and small sample corrections in the baseline DISTBIC algorithm, phone boundary detection accuracy is significantly improved, while false alarms are reduced. We also demonstrate further improvement in phonemic segmentation by taking into account how the model parameters are related in the probability density functions of the underlying hypotheses as well as in the model selection via the information complexity criterion and by employing M-estimators of the model parameters.
Bio: Constantine Kotropoulos received the Diploma degree with honors in Electrical Engineering in 1988 and the PhD degree in Electrical & Computer Engineering in 1993, both from the Aristotle University of Thessaloniki. He is currently an Associate Professor in the Department of Informatics at the Aristotle University of Thessaloniki. Since September 1st, 2008 and for one calendar year, he has been a visiting research scholar in the Department of Electrical and Computer Engineering at the University of Delaware, U.S.A. He conducted also research in the Signal Processing Laboratory at Tampere University of Technology, Finland during the summer of 1993. He has co-authored 38 journal papers, 144 conference papers, and contributed 6 chapters to edited books in his areas of expertise. He is co-editor of the book "Nonlinear Model-Based Image/Video Processing and Analysis" (J. Wiley and Sons, 2001). His current research interests include audio, speech, and language processing; signal processing; pattern recognition; multimedia information retrieval; biometric authentication techniques, and human-centered multimodal computer interaction. Prof. Kotropoulos was a scholar of the State Scholarship Foundation of Greece and the Bodossaki Foundation. He is a senior member of the IEEE and a member of EURASIP, IAPR, ISCA, and the Technical Chamber of Greece. He is a member of the Editorial Board of Advances in Multimedia journal and serves as a EURASIP local liaison officer for Greece.
|
|