Longer Features: They Do a Speech Detector Good

TitleLonger Features: They Do a Speech Detector Good
Publication TypeConference Paper
Year of Publication2012
AuthorsTsai, T.. J., & Morgan N.
Other Numbers3363

We have incorporated spectrotemporal features in a speech activity detection (SAD) task for the Speech in Noisy Environments 2 (SPINE2) data set. The features were generated by applying 2D Gabor filters to the mel spectrogram in order to measure the strength of various spectral and temporal modulation frequencies in different patches of the spectrogram. Using several different back-ends, the Gabor features significantly outperformed MFCCs, yielding relative reductions in equal error rate (EER) of between 40 and 50%. Compared to the other backends, Adaboost with tree stumps performed particularly well with Gabor features and particularly poorly with MFCCs. An investigation into the reasons for this disparity suggests that the most useful features for SAD incorporate information over longer time scales.


This work was partially supported by funding provided to ICSI by the U.S. Defense Advanced Research Projects Agency (DARPA) under contract number D10PC20024. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of DARPA or of the U.S. Government.

Bibliographic Notes

Proceedings of the 13th Annual Conference of the International Speech Communication Association (InterSpeech 2012), Portland, Oregon

Abbreviated Authors

T. J. Tsai and N. Morgan

ICSI Research Group


ICSI Publication Type

Article in conference proceedings