Confidence-Based Scoring: A Useful Diagnostic Tool for Detection Tasks

This paper uses an unconventional analysis as a tool to diagnosethe problems with three different speech activity detection systems.The unconventional analysis is to score the frames in anaudio file in order of confidence, starting with the frame that wehave the most confidence in and progressing towards less andless confident frames. By keeping track of the cumulative numberof errors, we can determine how the errors are distributedacross the data. Using speech activity detection on highly degradedaudio as a case example, we show how this simple analysiscan yield useful insight into system performance. In ourcase example, we use the analysis to establish that (1) a smallpercentage of the frames account for a lion’s share of the errors,(2) three different systems perform very poorly on the samesmall subset of ‘hard’ data, and (3) the ‘hard’ data is primarilycharacterized by its proximity to speech-nonspeech boundaries.Through follow-up analyses, we show that the boundaries are‘smoothly’ hard, and that scoring collars alone are not enoughto handle the problem. Through this case example, we demonstratethe utility of confidence-based scoring as a general diagnostictool for detection tasks on time-series data.


