Speaker Recognition/Identification by Humans- Forensic Data Science Laboratory

Speaker identification in courtroom contexts: Speaker identification by individual listeners and by groups of listeners

– Press release

Automatic speaker recognition technology outperforms human listeners in the courtroom (2022-11-02)

– Abstracts

Speaker identification in courtroom contexts – Part I: Individual listeners compared to forensic voice comparison based on automatic-speaker-recognition technology

Expert testimony is only admissible in common law if it will potentially assist the trier of fact to make a decision that they would not be able to make unaided. The present paper addresses the question of whether speaker identification by an individual lay listener (such as a judge) would be more or less accurate than the output of a forensic-voice-comparison system that is based on state-of-the-art automatic-speaker-recognition technology. Listeners listen to and make probabilistic judgements on pairs of recordings reflecting the conditions of the questioned- and known-speaker recordings in an actual case. Reflecting different courtroom contexts, listeners with different language backgrounds are tested: Some are familiar with the language and accent spoken, some are familiar with the language but less familiar with the accent, and others are less familiar with the language. Also reflecting different courtroom contexts: In one condition listeners make judgements based only on listening, and in another condition listeners make judgements based on both listening to the recordings and considering the likelihood-ratio values output by the forensic-voice-comparison system.

Speaker identification in courtroom contexts – Part II: Investigation of bias in individual listeners’ responses

In “Speaker identification in courtroom contexts – Part I” individual listeners made speaker-identification judgements on pairs of recordings which reflected the conditions of the questioned-speaker and known-speaker recordings in a real case. The recording conditions were poor, and there was a mismatch between the questioned-speaker condition and the known-speaker condition. No contextual information that could potentially bias listeners’ responses was included in the experiment condition – it was decontextualized with respect to case circumstances and with respect to other evidence that could be presented in the context of a case. Listeners’ responses exhibited a bias in favour of the different-speaker hypothesis. It was hypothesized that the bias was due to the poor mismatched recording conditions. The present research compares speaker-identification performance between: (1) listeners under the original Part I experiment condition, (2) listeners who were ahead of time informed that the recording conditions would make the recordings sound more different from one another than had they both been high-quality recordings, and (3) listeners who were presented with high-quality versions of the recordings. Under all experiment conditions, there was a substantial bias in favour of the different-speaker hypothesis. The bias in favour of the different-speaker hypothesis therefore appears to be an intrinsic bias rather than being due to recording conditions.

Speaker identification in courtroom contexts – Part III: Groups of collaborating listeners compared to forensic voice comparison based on automatic-speaker-recognition technology

Expert testimony is only admissible in common-law systems if it will potentially assist the trier of fact. In order for a forensic-voice-comparison expert’s testimony to assist a trier of fact, the expert’s forensic voice comparison should be more accurate than the trier of fact’s speaker identification. “Speaker identification in courtroom contexts – Part I” addressed the question of whether speaker identification by an individual lay listener (such as a judge) would be more or less accurate than the output of a forensic-voice-comparison system that is based on state-of-the-art automatic-speaker-recognition technology. The present paper addresses the question of whether speaker identification by a group of collaborating lay listeners (such as a jury) would be more or less accurate than the output of such a forensic-voice-comparison system. As members of collaborating groups, participants listen to pairs of recordings reflecting the conditions of the questioned- and known-speaker recordings in an actual case, confer, and make a probabilistic consensus judgement on each pair of recordings. The present paper also compares group-consensus responses with “wisdom of the crowd” which uses the average of the responses from multiple independent individual listeners.

– Presentation by Phil Weber:

Justifying AI in court: Human or machine analysis of evidence?

BrumAI - Birmingham Artificial Intelligence Meetup. 2024-04-24

https://youtu.be/NTleeUOkivo

– Participant information statements

Individual experiment: Australian-English listeners

Individual experiment: North-American-English listeners

Experimento individual: Oyentes hispanohablantes

Group experiment: SONA

Group experiment: SONA-P

– Demonstration of experiment software from Part I

https://youtu.be/Y2mpI4ZuGS0

– Stimuli

Example questioned-speaker recording:

Example known-speaker recording:

Part I and Part II stimuli - 2022-10-17a.zip

Part III stimuli - 2023-06-30a.zip

– Results

Part I and Part II results - 2023-03-23a.zip

Part III results - 2023-07-17a.zip

– Analysis software

Part I, Part II, and Part III analysis software - 2023-07-18a.zip (Python)

Part I and Part II demographics analysis software - 2022-10-17a.zip (Matlab)

Un método para calcular la fuerza de la evidencia asociada con el supuesto reconocimiento de un locutor conocido por un testigo auditivo

– V Congreso de Ciencia Forense, Universidad Autónoma de México

Recording of presentation by Claudia Rosas originally live-streamed 2021-10-08

Investigaciones anteriores sobre el reconocimiento de locutores por parte de testigos auditivos se han centrado en factores que afectan la exactitud de los testigos auditivos en general. Los resultados de esta investigación han permitido a los testigos expertos hacer generalizaciones sobre si es más o menos probable que las condiciones de un caso conduzcan a identificaciones / reconocimientos correctos o incorrectos. Han proporcionado orientación sobre cómo diseñar alineaciones de locutores para aumentar la exactitud y reducir el sesgo. Sin embargo, la investigación previa no ha proporcionado una solución a la pregunta clave de la evidencia en un caso particular: ¿Cuál es la fuerza de la evidencia asociada con la identificación / reconocimiento de este testigo auditivo en particular de este hablante en particular bajo las condiciones particulares de este caso?

Esta presentación describe y demuestra un método para evaluar la fuerza de la evidencia cuando un testigo afirma reconocer una voz como la voz de un hablante que le es conocido. La demostración se basa en las condiciones de un caso real en el que un testigo afirma reconocer una voz en una grabación (la voz de un delincuente) como la voz de una persona conocida para él (un sospechoso). La víctima, que se encontraba en el maletero de un automóvil, realizó una llamada a los servicios de emergencia a través de un teléfono móvil. La llamada se grabó en el centro de llamadas. La voz del delincuente estaba en el fondo de la grabación (el delincuente aparentemente estaba sentado en el asiento delantero del automóvil). La parte de la grabación durante la cual se pude oír la voz del delincuente duró aproximadamente tres segundos.

El método calcula un factor de Bayes que responde a la pregunta: ¿Cuál es la probabilidad de que un testigo auditivo cooperativo afirme reconocer al delincuente como sospechoso si el delincuente era el sospechoso? frente a ¿cuál es la probabilidad de que el testigo auditivo afirme reconocer al delincuente como el sospechoso si el delincuente no era el sospechoso sino algún otro hablante de la población relevante? Los datos relevantes para la demostración fueron las respuestas de los oyentes ingenuos a las grabaciones de los locutores que eran conocidos para los oyentes. Los locutores fueron grabados en condiciones que reflejaban las del caso. Las grabaciones se presentaron a los oyentes en una alineación de locutores. En respuesta a cada grabación, si el oyente afirmó reconocer al hablante, se le pidió que escribiera el nombre del hablante, de lo contrario, declarara que no reconoció al locutor. Los factores de Bayes se calcularon utilizando estos datos y distribuciones beta-binomiales con los a prioris de Jeffreys.

Slides

https://youtu.be/Inna8YW-_mc?t=4420

Project Team

Laboratory members:

– Geoffrey Stewart Morrison

– Nabanita Basu

– Phil Weber

– Cuiling Zhang

– Claudia Rosas

Collaborators:

– Jorge Sommerhoff

Professor Emeritus, Instituto de Acústica, Universidad Austral de Chile

– Kristy A Martire

Associate Professor & Director of the Master of Psychology (Forensic) Program, University of New South Wales

– Agnes S Bali

Postdoctoral Researcher, School of Psycology, University of New South Wales

– Gary Edmond

Professor, School of Law, University of New South Wales

http://forensic-voice-comparison.net/speaker-recognition-by-humans/

This webpage is maintained by Geoffrey Stewart Morrison and was last updated 2024-12-02

	Forensic Data Science Laboratory
Home	Speaker Recognition / Speaker Identification by Human Listeners