Using Fast and Slow Modulations to Model Human Hearing of Fast and Slow Speech

TitleUsing Fast and Slow Modulations to Model Human Hearing of Fast and Slow Speech
Publication TypeTechnical Report
Year of Publication2015
AuthorsChang, S-Y., Morgan N., Raju A., Alwan A., & Kreiman J.
Other Numbers3804

A collaboration between the Speech Processing and Auditory Perception laboratory at UCLAand the Speech Group at ICSI focused on the refinement of the simple models used in ASR withrepresentations that have been filtered in the modulation domain to better match humanperception. To quantitatively measure the effects of this modification, UCLA collected CVCstimuli uttered quickly and more slowly, and conducted perceptual tests for clean and noisyversions of the stimuli. The ICSI team then conducted tests to determine if inclusion of Gaborfilteredspectrograms with lower or higher temporal modulations could be used to correlatebetter with human perception. Here we report on results that confirmed an improvement inthis correlation, particularly for noisy and rapid speech, while also improving the accuracy.Overall accuracies in noise for all systems tested, though, were quite poor, suggesting thatfurther auditory modeling might be necessary to improve the modeling of human performanceon this task.


We are indebted to Bernd Meyer and Marc Schädler for their versions of Gabor filters that we routinely use. And last but not least, we acknowledge the support of NSF Award 1248047.Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors or originators and do not necessarily reflect the views of the National Science Foundation.Note: title of the NSF grant at ICSI is Towards Modeling Human Speech Confusions in Noise. This was a project of the Speech Group.

Bibliographic Notes

ICSI Technical Report TR-15-003

Abbreviated Authors

S.-Y. Chang, N. Morgan, A. Raju, A. Alwan, and J. Kreiman

ICSI Research Group


ICSI Publication Type

Technical Report