Event

 
 

Speech Enhancement based on the Modified Phase-Opponency Model

Prof. Carol Espy-Wilson

University of Maryland

Friday, February 27, 2009
11:00-12:00

A major issue for speech recognition technology is its performance in everyday noisy environments. In this talk I will discuss a speech enhancement algorithm we have developed that is based on the auditory PO model proposed for detection of tones in noise. The PO model includes a physiologically realistic mechanism for processing the information in neural discharge times and exploits the frequency-dependent phase properties of the tuned filters in the auditory periphery by using a cross-auditory-never-fiber coincidence detection for extracting temporal cues. An important feature of the PO model is that it does not need to estimate the noise characteristics, nor does it assume that the noise satisfies any statistical model. We modified the PO model (MPO) so that its basic functionality was maintained, but the properties of the model can be analyzed and modified independently. In addition, we improved on its performance by coupling the PO model with our Aperiodicity, Periodicity, Pitch (APP) detector. I will also show perceptual data showing the effectiveness of the MPO-APP speech enhancement algorithm for people with hearing impairments. Presently, we are investigating additional processing to further improve the performance of the MPO-APP, especially at low signal-to-noise ratios. Some of these techniques involve variable frame rate analysis and the application of single-channel speech segregation techniques based on a new paradigm wherein the mixture signal is shared among the participating speakers, rather than divided amongst them as done in present approaches.

Bio:
Dr. Carol Espy-Wilson is a Professor in the Electrical and Computer Engineering Department at the University of Maryland. She directs the Speech Communication Lab at the University of Maryland. Her research interests include the integration of engineering, linguistics and speech science to study speech communication. She is developing an approach to speech recognition based on landmarks and gestural phonology to address the limitations of present recognizers (e.g., effective handling of prosodically-guided variability). She also conducts research in the areas of speech production, speech enhancement, speaker recognition, single-channel speaker separation and language and genre detection in audio content analysis. Presently, Dr. Espy-Wilson is on sabbatical as a Radcliffe Fellow at Harvard University.

 
Copyright © 2005 International Computer Science Institute. All Rights Reserved.