| |
Forensic Automatic Speaker Recognition: Fiction or Science?
Joaquin Gonzalez-Rodriguez
Universidad Autonoma Madrid, Spain
Thursday, March 19, 2009
2:00-3:30 p.m.
The expectations of Courts, juries and fact finders about what can be obtained from the scientific analysis of the forensic evidence are usually unrealistic, clearly represented in the well-known CSI-effect. Additionally, the actual analysis and reporting procedures in most forensic identification areas, including speaker recognition, have usually been far from truly scientific. However, three different key factors have developed and consolidated in the last 20 years which clearly define the proper way to analyze and report forensic evidence. First, the requirements for admissibility of forensic evidence are becoming increasingly demanding, exemplified by the US Daubert ruling (1993) and US Federal Rule of Evidence 702 (2000), forcing old forensic identification sciences to move from expert-based approaches to more scientific and data based ones. Secondly, an approach to forensic identification based on Bayesian inference of the identity of sources, which fully respects and clarifies the Court and scientist roles, is widely accepted as the best way to provide useful information to Courts in the presence of uncertainty (always present in forensic cases) and is being progressively adopted by more and more scientists and laboratories across different countries and forensic identification disciplines. Last and not least, DNA typing has shown that it is possible to successfully fulfill both previous requirements, with a truly scientific and data-based approach. DNA typing has become the new golden standard that classical forensic identification sciences should emulate, and we will show in this talk that this approach can also be followed with voice evidence. However, DNA profiling has a better knowledge of the sources of variation of DNA markers in a given population, and much greater discrimination ability among individuals than speaker recognition (especially with forensic voice recordings). Therefore, caution is still needed, as the information reported to a court will be a reliable estimate of the weight of the evidence only if the conditions of assessment and calibration of the system and the type of speech reasonably match those in the forensic case at hand, which requires proper knowledge of the variations of the acoustic and/or linguistic features in use across the sociolinguistic populations involved in the case.
Bio:
Joaquin Gonzalez-Rodriguez* *received the M.S. degree in 1994, and the Ph.D. degree "cum laude" in 1999, both in electrical engineering, from Univ. Politecnica de Madrid (UPM), Spain. After 15 years of research and lecturing at UPM, he is since May 2006 an Associate Professor at the Computer Science Department at Univ. Autonoma de Madrid, where he leads the Speech group of the ATVS-Biometric Recognition Group. He has led ATVS participations in NIST speaker (2001, 2002, 2004, 2005, 2006, and 2008) and language (2005 and 2007) recognition evaluations, and in the 2003 NFI-TNO forensic speaker recognition evaluation.
Dr. Gonzalez-Rodriguez is since 2000 an invited member of FSAAWG (Forensic Speech and Audio Analysis Working Group) at ENFSI (European Network of Forensic Science Institutes), and has focused his research work on the proper use of automatic speaker recognition in forensic science. He is a member of ISCA and the Signal Processing Society of IEEE, and is also a member of the program committee of the ISCA Odyssey Conferences on Speaker and Language Recognition, having been vice-chair of Odyssey 2004 in Toledo, Spain.
|
|