The Auditory Organization of Speech in Listeners and Machines

Publication TypeTechnical Report
Year of Publication1998
AuthorsCooke, M. P., & Ellis D. P. W.
Other Numbers1137

Speech is typically perceived against a background of other sounds. Listeners are adept at extracting target sources from the acoustic mixture reaching the ears. The auditory scene analysis account holds that this feat is the result of a two stage process. In the first stage, sound is decomposed both within and across auditory nuclei. Subsequent processes of perceptual organization are informed both by cues which suggest a common source of origin and prior experience. These operate on the decomposed auditory scene to extract coherent evidence for one or more sources for subsequent processing. Auditory scene analysis in listeners has been studied for several decades and recent years have seen a steady accumulation of computational models of perceptual organization. The purpose of this review is to describe the evidence for auditory organization in listeners and to explore the computational models which have been motivated by such evidence. The primary focus is on speech rather than on sources such as polyphonic music or nonspeech ambient backgrounds, although these other domains are equally amenable to auditory organization. The review concludes with a discussion of the relationship between auditory scene analysis and alternative approaches to sound source segregation.

ICSI Technical Report TR-98-016

M. Cooke and D. P.W. Ellis

Technical Report