Introduction
In February 2009 the National Research Council (NRC) Report to Congress on Strengthening Forensic Science in the United States found that:
- “[S]ome forensic disciplines are supported by little rigorous systematic research to validate the discipline’s basic premises and techniques. There is no evident reason why such research cannot be conducted” (p. 22).
- “The development of scientific research, training, technology, and databases associated with DNA analysis have resulted from substantial and steady federal support for both academic research and programs employing techniques for DNA analysis. Similar support must be given to all credible forensic science disciplines if they are to achieve the degrees of reliability needed to serve the goals of justice.” (p. 13)
Over the last decade, a small number of researchers (principally in Australia, Spain, and Switzerland) have been working on developing demonstrably valid and reliable forensic voice comparison with evidence evaluated using the same framework as is applied to the evaluation of DNA evidence.
Meanwhile in the Americas there has been little interest in this field of research.
The NRC report gives a new impetus for conducting forensic voice comparison research and holds out the hope for new funding opportunities in this area.
The 2nd Pan-American/Iberian Meeting on Acoustics provides an excellent opportunity to bring together researchers from Iberia and other parts of the world with researchers from the Americas to help foster research in this area in the Americas.
It also provides a venue for an exchange of ideas between researchers working on acoustic-phonetic and signal-processing approaches to forensic voice comparison.
Tutorial
Monday 15 November 2010, 7:00–9:00 pm
Coral Kingdom 2/3
(1EID1)
The tutorial will present an introduction to the forensic evaluation of acoustic evidence using the same framework as is applied to the evaluation of DNA evidence.
Both acoustic-phonetic and signal-processing approaches to forensic voice comparison will be described.
The focus will be on evidence in the from of voice recordings, but the evaluative framework can also be applied to other forms of evidence, including audio recordings of other types of acoustic events, and the tutorial should therefore be of value to anyone interested in forensic acoustics in general, not just forensic voice comparison.
Presenters:
- Geoffrey Stewart Morrison
- Director, Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales
- Daniel Ramos
- Assistant Professor, Biometric Recognition Group, Autonomous University of Madrid – Universidad Autónoma de Madrid
Both presenters are invited lecturers in the Judicial Phonetics Specialization in the Masters in Phonetics and Phonology Program of the Consejo Superior de Investigaciones Científicas [Spanish National Research Council] / Universidad Internacional Menéndez Pelayo.
They previously presented a similar tutorial at the International Speech Communication Association’s Intespeech 2008 conference.
The tutorial will be presented in English, but both presenters can also field questions in Spanish.
Lecture notes:
A pdf of the slides for the tutorial presentation is available here (updated 14 Nov 2010).
In conjunction with this tutorial, the publisher of Morrison’s new introduction to forensic voice comparison
- Morrison, G.S. (2010). Forensic voice comparison. In I. Freckelton, & H. Selby (Eds.), Expert Evidence (Ch. 99). Sydney, Australia: Thomson Reuters.
will make pdf downloads available at half-price between 8 November and 3 December 2010. When ordering use promotion code: EEF2010
For more information see http://expert-evidence.forensic-voice-comparison.net/
ASA Forensic Acoustics Group
Tuesday 16 November 2010, 7:30–9:30 pm
Coral Kingdom 1
Meeting to propose the establishment of and organize a Forensic Acoustics Group within the Acoustical Society of America.
Special Session
Wednesday 17 November 2010
Sponsored by the Speech Communication Technical Committee.
Papers and posters on acoustic-phonetic and signal-processing approaches to demonstrably valid and reliable forensic evaluation of audio recordings of human voices and other acoustic events.
Invited presentations:
Wednesday 17 November 2010, 8:00–11:40 am
Grand Coral 1A
Lecture presentations listed in order of presentation.
- Andrzej Drygajlo
- Speech Processing and Biometrics Group, Swiss Federal Institute of Technology at Lausanne – École Polytechnique Fédérale de Lausanne
- Dr Drygajlo is unfortunately unable to attend the meeting due to unforseen last-minute occurences. This presentation will be given on his behalf by Dr Michael Jessen.
- Value and interpretation of biometric evidence in forensic automatic speaker recognition
(3aSC1, 8:05)
- Forensic speaker recognition (FSR) is the process of determining if a specific individual (suspected speaker) is the source of a questioned voice recording (trace). Forensic automatic speaker recognition (FASR) has proven an effective tool in the fight against crime, yet there is a constant need for more research due to the difficulties in adapting automatic methods of voice comparison to the forensic methodology that provides a coherent way of assessing and presenting recorded speech as scientific evidence. The ongoing paradigm shift in the forensic speaker recognition needs biometric methods for the calculation of the evidence value, its strength and the evaluation of this strength under operating conditions of the casework In such methods, the biometric evidence consists of the quantified degree of similarity between speaker-dependent features extracted from the trace and speaker-dependent features extracted from recorded speech of a suspect, represented by his/her model. This presentation aims at introducing deterministic and statistical automatic speaker recognition (ASR) methods that provide several ways of quantifying and presenting recorded voice as biometric evidence, as well as the assessment of its strength (likelihood ratio) in the Bayesian interpretation framework, including scoring and direct methods, compatible with interpretations in other forensic disciplines.
- Slides
Didier Meuwly
Netherlands Forensic Institute – Nederlands Forensisch Instituut
Forensic speaker recognition: Comparison and validation of automatic systems over 3 generations
(3aSC2, 8:30)
The first aim of this paper is to demonstrate the improvement of automatic speaker recognition systems used for forensic evaluation over a period of 12 years. The second aim consists of exploring how the results of different systems can be compared and their improvement measured. The same set of experiments is replicated on 3 different systems: the original LR-based ASR system from 1998 (ASPIC I), the EPFL ASR system of 2004 (ASPIC II) and the Agnitio ASR system (BATVOX) from 2010. The reference database, Polyphone, consisting of 2000 male and 2000 female speakers from the French part of Switzerland and the forensic database, Polyphone-IPSC, consisting of 16 male and 16 female speakers from the same region, were used to test the following forensic conditions: spontaneous speech, disguised speech, PSTN, GSM, Signal to noise ratio from +30 dB to –3dB, digital and analogue recording, close set of family related speakers. The results are visualized using Tippett plots. The refinement and calibration of the systems are measured and compared using the log-likelihood-ratio cost (Cllr). Finally, the value, in terms of forensic validation, of the methodology used and the results produced are discussed.
- Dr Meuwley is unfortunately unable to attend the meeting due to unforseen last-minute occurences. The following presentation will be substituted for his presentation.
- Geoffrey Stewart Morrison
- Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales
- Calibration and fusion
(3aSC2 substitution, 8:30)
- Automatic-speaker-recognition systems typically output scores. These scores provide information about the similarity of two recordings taking into account their typicality with respect to the background sample, but they can only be interpreted relative to one another: a larger score indicates greater support for the same-speaker hypothesis relative to a smaller score, but the absolute value of a score has no meaning. Calibration is a procedure for converting scores to likelihood ratios, a requirement for a forensic voice comparison. A popular calibration technique is logistic regression, which can also be used to fuse and calibrate parallel sets of scores. Parallel sets of scores can result from running multiple forensic-voice-comparison systems on the same set of recordings, or running a single acoustic-phonetic forensic-voice-comparison system on different phonetic units within the same set of recordings. Logistic-regression calibration and fusion are described with examples drawn form automatic and acoustic-phonetic forensic voice comparison.
- Slides
- This talk was originally scheduled as part of the tutorial.
- Philip Rose
- School of Language Studies, Australian National University
- Combining linguistic and non-linguistic information in likelihood-ratio-based forensic voice comparison: A hybrid automatic-traditional system
(3aSC3, 8:55)
- In the last decade, forensic voice comparison has experienced a remarkable paradigm shift [Morrison, Sci. Justice 49, 298–308 (2009)]. Both automatic and traditional phonetic approaches have been developed within the new paradigm. The main difference is that traditional approaches are typically local in both time and frequency domains, with features like formant frequencies extracted from linguistically comparable items (e.g., words or phonemes), whereas automatic approaches are typically global, with long-term spectral properties used and linguistic information treated as noise. Since neither makes use of all the information present, combining them could improved performance. A fully-automatic and a partially-traditional system were compared. Data were pairs of non-contemporaneous landline-telephone recordings of 60 speakers from the Japanese National Research Institute of Police Science database (net 35–40 s speech per recording). In the fully-automatic system the whole speech-active portion of the recording was analyzed using 12th order LPCCs, mean cepstral subtraction, GMM-UBM, and logistic-regression calibration. In the partially-traditional system the same procedures were applied only to tokens of [oː], [N], and [ɕ] extracted from the recordings, with logistic-regression fusion of the results. The performance of each system and the fusion of the two were compared using the log-likelihood-ratio cost (Cllr).
- Slides
- Michael Jessen, Timo Becker
- Department of Speaker Identification and Audio Analysis, German Federal Police Office – Bundeskriminalamt
- Long-Term Formant Distribution as a forensic-phonetic feature
(3aSC4, 9:20)
- With the Long-Term Formant Distribution (LTF) method [F. Nolan and C. Grigoras, Int. J. Speech, Lang. Law 12, 142–173 (2005)], manually-corrected LPC-based formant tracks are extracted over all vocalic portions of the recording of a speaker in which the formants F2 and F3 are sufficient well-structured. LTF analysis has been successfully added to the inventory of phonetic features that are used in voice comparison casework. Current research in our lab has highlighted a number of advantages of the LTF-method, including high inter-expert reliability (different phoneticians when using the method arrive at highly consistent results), anatomical motivation (Long-Term F2 and F3 are negatively correlated with speaker height), and language independence (LTF patterns in different languages – so far German, Russian, and Albanian – do not differ significantly). Presently, quantitative measures of inter-individual variation in case data are investigated, including Equal Error rates (EER) and calibrated Likelihood Ratios (LR) based on Gaussian Mixture Modeling (GMM) of the formant tracking raw data [T. Becker, M. Jessen and C. Grigoras, Proc. Interspeech 2008, 1505–1508]. The final inter-individual variation results will be presented, along with results on how the LTF method compares to automatic speaker recognition, which is applied to the same data.
- Slides
- Daniel Ramos, Javier González-Domínguez, Joaquín González-Rodríguez
- Biometric Recognition Group, Autonomous University of Madrid – Universidad Autónoma de Madrid
- High-Performance session variability compensation in forensic automatic speaker recognition
(3aSC5, 9:45)
- Recently, the main performance improvement in automatic speaker recognition technology has been due to session variability compensation techniques, mainly based on factor analysis (FA), which have reduced the Equal Error Rate (EER) of state-of-the-art systems by a factor of ten in less than five years (e.g., EER<2% for NIST SRE 2008 telephone speech). Moreover, such systems are able to compute millions of comparisons in thousand times faster than real time, after speech features are extracted. However, some challenges remain, because if there is a mismatch between the conditions of the FA training database and the speech used for comparison, the effectiveness of the compensation significantly decreases. This problem is especially relevant in forensic voice comparison, where the availability of speech matching operational conditions is usually sparse. In this presentation we show the impact of this effect in realistic simulated case studies. We use the Baeza - Ahumada IV database, which contains speech acquired with the Spanish Guardia Civil facilities, used in their daily work. We also present algorithms to handle sparsity in the data used for training FA models. Finally, we outline future research plans in order to improve session variability compensation performance in forensically realistic conditions.
- Slides
(Break 10:10–10:25)
- Geoffrey Stewart Morrison1, Julien Epps1, Philip Rose2, Tharmarajah Thiruvaran1, Cuiling Zhang3
- 1Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales
- 2School of Language Studies, Australian National University
- 3Department of Forensic Science & Technology, China Criminal Police University
- Measuring reliability in forensic voice comparison
(3aSC6, 10:25)
- Recently there has been a great deal of concern in forensic science about validity and reliability (accuracy and precision). The log-likelihood-ratio cost (Cllr), developed for automatic speaker recognition, is increasingly applied as a standard measure of accuracy in forensic voice comparison, but so far there has been little work on developing a metric of precision within this field. Because voice data can have a large amount of intrinsic variation at the source, and likelihood ratios are typically calculated using a single suspect recording and a single offender recording, assessing the precision of a forensic-voice-comparison system is extremely important. This presentation discusses the importance of measuring precision and describes two procedures, one parametric and one non-parametric, for calculating 95% credible intervals for the likelihood ratios resulting from running tests of forensic-voice-comparison systems (in which some comparisons are know to be same-speaker comparisons and others are known to be different-speaker comparisons). Examples are drawn from both acoustic-phonetic and automatic forensic-voice-comparison systems.
- Slides
- Jeff Boyczuk
- Audio and Video Analysis Section, Royal Canadian Mounted Police – Gendarmerie royale du Canada
- Factors affecting the intelligibility of recorded speech: Considerations for forensic audio “best evidence”
(3aSC7, 10:50)
- Derived from a traditional common law rule, the “best evidence” standard as applied to recorded audio prescribes that an original recording, and not a duplicated or altered copy, will be presented in legal proceedings. The intent of this standard is to ensure the integrity of the original evidence is preserved, such that a court is reasonably assured it is being presented with the most complete and accurate record of the recorded evidence. However, when considering forensic audio recordings of speech, that are frequently made in adverse acoustic environments, presentation of such recordings in their original form may not afford a court with the opportunity for a complete and accurate assessment of the evidence in question – namely, what words are being spoken on the recording? The current paper summarizes the technological and listener-based factors that should be considered when speech intelligibility is of prime importance in meeting the best evidence standard for presentation of forensic audio in court proceedings. Illustrative examples from recent court cases will be provided.
- Slides
- Ray Bull
- Forensic Section, School of Psychology, University of Leicester
- Witnesses’/Victims’ recognition of a once-heard voice
(3aSC8, 11:15)
- This presentation summarises the results of three decades of research testing the validity of lay persons’ (e.g. witnesses’/victims’) ability to recognize the voice of a once-before heard stranger (e.g. a crime perpetrator). Studies around the world have consistently found that people are usually very poor at this task, even with short delays and adequate lengths of speech. Some courts have taken notice of this research and caution witnesses accordingly; however, in many cases voice line-ups have been poorly constructed and are therefore invalid. The final part of the presentation provides an account of some court cases in which I have participated as an expert.
- Slides
Contributed presentations:
Wednesday 17 November 2010, 1:00–3:00 pm
Grand Coral 3
All posters will be on display from 1:00 to 3:00 pm. To allow contributors an opportunity to see other posters, contributors of odd-numbered papers will be at their posters from 1:00 to 2:00 pm and contributors of even-numbered papers will be at their posters from 2:00 to 3:00 pm.
- Jeff Boyczuk1, David Luknowsky1, Bradford Gover2, Heping Ding3
- 1Audio and Video Analysis Section, Royal Canadian Mounted Police – Gendarmerie royale du Canada
- 2Institute for Research in Construction, National Research Council Canada – Couseil national de recherches Canada
- 3Institute for Microstuctural Sciences, National Research Council Canada – Couseil national de recherches Canada
- Improving the speech intelligibility of forensic audio recordings through adaptive filtering with non-synchronous interference signals
(3pSC1)
- Forensic audio recordings are frequently made in uncontrollable acoustic environments where background sound emanating from television, radio, and video or music playback may interfere with the intelligibility of the intended “target” speech on a recording. In such cases, adaptive filtering techniques have proven highly effective in eliminating the interfering sound sources and improving intelligibility, provided that the interfering reference signal was acquired simultaneously with the target speech. However, in cases where interfering signals are acquired through a post hoc retrieval of broadcast, music or video recordings, non-linear time base differences between the original and the secondarily-acquired reference may significantly lessen the effectiveness of conventional adaptive filtering techniques in improving speech intelligibility. The current paper describes the results in applying a commercially-available adaptive filtering tool, as well as a newly developed tool, Drift-Compensated Adaptive Filtering (DCAF), for improving the intelligibility of recorded speech when utilizing a non-synchronously acquired reference signal. Listening tests show an overall improvement in speech intelligibility through the application of adaptive filtering with non-synchronous reference signals, with greater intelligibility for DCAF-processed audio recordings as compared to recordings processed with conventional adaptive filtering techniques.
- Cassie Dallasarra, Aericka Dunn, Shanna White, Al Yonovitz, Joe Herbert
- Department of Communication Science and Disorders, University of Montana
- Speaker Identification: Effects of noise, telephone bandwidth and word count on accuracy
(3pSC2)
- It is of great interest by the legal system to identify individuals by their voice. Controversy in this area has continued for nearly five decades with states divided with regard to the merits of voice identification. The American Board of Recorded Evidence [ABRE (1999)] has established standards for the determination of identification or elimination of speakers. Digital spectrographic techniques, including formant tracking and finer descriptive measures of speech, are dramatic improvements and allow a test of the ABRE standards using improved technology embodying similar principles set forth in an aural and spectrographic method. Ten speakers recorded synthetic sentences at two different times. Words were distorted by mixing with noise or by telephone bandwidth reduction. All possible speaker pairs of “elimination” were presented to qualified listeners as well as equal probability of “identification.” For the comparison, subjects were presented spectrograms, formant tracks, fundamental frequency, and the ability to listen to single words. Confidence ratings and determinations (elimination or identification) for each comparison word were made with the addition of added words (<20). The results will be discussed with regard to correct classification based upon the number of words required and the reduction of correct classification based upon the distorted conditions.
- Sandra Ferrari Disner1, Sean A. Fulop2, Fang-Ying Hsieh1
- 1Department of Linguistics, University of Southern California
- 2Department of Linguistics, California State University Fresno
- The fine structure of phonation as a biometric
(3pSC3)
- The reassigned spectrogram, an enhanced time-frequency display that has been proposed as a means of focusing in on the details of the phonation process [Fulop & Disner, J. Acoust. Soc. Am. 125, 2530 (2009)], was used to examine a set of sustained vowels produced by 16 speakers of American English. Certain characteristics, such as the relative amplitudes of the formants, the presence or absence of a so-called ‘voice bar’, and evidence of secondary excitation within a single glottal period, appear to be rather stable across repetitions of the same vowel by the same speaker, but divergent across speakers. The possibility of this method being used to corroborate conventional methods of forensic speaker identification, is thus afforded a measure of support.
- Cuiling Zhang1, Geoffrey Stewart Morrison2
- 1Department of Forensic Science & Technology, China Criminal Police University
- 2Forensic Voice Comparison Laboratory, School of Electrical Engineering & Telecommunications, University of New South Wales
- Accuracy and precision of forensic voice comparison using the Chinese /iao/ triphthong
(3pSC4)
- Some studies on forensic voice comparison have fitted parametric curves to the formant trajectories of diphthongs spoken in controlled phonetic environments, and have obtained results with a high degree of validity. The present study fits parametric curves to the formant trajectories of tokens of the Standard-Chinese /iao/ triphthong extracted from telephone conversations in which there was no control over the phonetic context. Two non-contemporaneous recordings from each of 60 female speakers were analysed. Likelihood ratios were calculated for a test set in which some comparisons are known to be same-speaker comparisons and others known to be different-speaker comparisons. The accuracy and precision (validity and reliability) of the test results were calculated using the log-likelihood-ratio cost (Cllr) and an estimate of their 95% credible interval respectively.
- Poster
- Harry Hollien, James D. Harnsberger
- Department of Linguistics, University of Florida
- Speaker identification: The case for speech vector analysis
(3pSC5)
- The problem of identifying speakers from voice analysis is a serious one. Many analytical procedures have been proposed; most have been based on signal analysis algorithms. Yet it is clear that human perceivers often can make accurate identifications (even under difficult circumstances) and extensive research supports this position. How do they accomplish this task? They do so by acoustic and temporal assessment of the talker’s speech/voice signal. Several analysis procedures have attempted to mimic human perception for this purpose and a brief review of the relevant data will be presented. It will be followed by presentation of the results of a four-vector experiment in which each vector integrates 3–5 speech parameters. Identification of 28 male voices resulted from three replications in a field of 10 foil voices. It was found that identification scores for the voice, vowel, and fundamental frequency vectors were high, and that for the temporal vector was modest. Moreover, of the summation scores across all vectors and replications, only one failed to identify the target speaker. While these results were based on high quality audio recordings, it can still be argued that robust speaker identification is possible from a limited set of speech and voice characteristics.
- Daniel García-Romero, Carol Espy-Wilson
- Department of Electrical Engineering, University of Maryland
- Automatic speaker recognition: Advances towards informative systems
(3pSC6)
- Joint factor analysis (JFA) has become the state-of-the-art in automatic speaker recognition systems. In this paradigm, the information contained in a variable-length speech recording is summarized as a fixed-length supervector by means of a soft partition of the acoustic space through a Gaussian mixture model. Moreover, an explicit mechanism to account for the speaker and undesired (inter-session) variability in terms of a small set of factors results in very accurate answers to the question of whether two speech samples are uttered by the same speaker or not. However, no apparent answer to the question of what exactly it is that makes two particular voices similar or different is obtained by this approach. To address this issue of interpretability, we propose the modification of the standard JFA approach in two ways. First, by explicitly incorporating phonetic information in the construction of the supervectors so that subsets of its entries get associated with specific phonetic contexts. Second, by modifying the estimation of the speaker and inter-session factors so that phonetically contextualized factors are obtained. A study of the recognition accuracy as well as the interpretability of the results of the proposed approach will be performed on the NIST 2008 speaker recognition evaluation.
- Ewald Enzinger
- Acoustics Research Institute, Austrian Academy of Science
- Measuring the effects of adaptive multi-rate (AMR) codecs on formant tracker performance
(3pSC7)
- Several approaches to forensic speaker comparison rely on formant centre-frequency measurements as features due to their rather straightforward interpretation as resonance frequencies of the cavities of the human vocal tract. Formant tracking algorithms, mostly based on linear predictive coding (LPC), are commonly used for automatic extraction. Telephone conversations constitute a substantial amount of forensic material, which increasingly involves wireless communication channels instead of landline transmission. The effects and limitations introduced by the Adaptive Multirate (AMR) set of codecs that is used for speech transmission in GSM and UMTS networks are therefore of special interest in forensic settings. To evaluate the extent of the effects that are caused solely by the codecs, speech recordings were en- and decoded with the different bitrate levels provided by the AMR (narrowband) codec. The formant frequencies of vowel segments were extracted using different trackers and settings. The preliminary results suggest partial shifts in frequency depending on codec level and individual speakers, but no consistent trend emerges.
- Eugenia San Segundo Fernández
- Laboratorio de Fonética, Consejo Superior de Investigaciones Científicas – Phonetics Laboratory, Spanish National Research Council / Universidad Internacional Menéndez Pelayo
- Parametric representations of the formant trajectories of Spanish vocalic sequences for likelihood-ratio-based forensic voice comparison
(3pSC8)
- Non-contemporaneous speech samples from 30 Spanish male speakers were compared within the forensic-likelihood-ratio framework. The acoustic parameters studied were the formant trajectories of a series of vocalic sequences, /ue/ /ie/ /ia/ /ai/ (pronounced as diphthongs and in hiatus), in order to analyze their suitability for forensic voice comparison. Following Morrison [J. Acoust. Soc. Am. 119, 2387–2397 (2009)], parametric curves (polynomials and discrete cosine transforms) were fitted to these formant trajectories. The estimated coefficient values from the parametric curves were used as input to a multivariate-kernel-density formula for calculating likelihood ratios expressing the probability of obtaining the observed differences between two speech samples under two opposing hypotheses: that the samples were produced by the same speaker and that the samples were produced by different speakers. Cross-validated likelihood-ratio results from systems based on different parametric curves were calibrated and evaluated using the log-likelihood-ratio-cost function (Cllr). The cross-validated likelihood ratios from the best-performing system for each vocalic sequence were fused using logistic regression.
Alejandro Wang
Masters in Phonetics and Phonology Program, Consejo Superior de Investigaciones Científicas – Spanish National Research Council / Universidad Internacional Menéndez Pelayo
Forensic voice comparison based on nasal formants
(3pSC9)
One of the main problems faced in forensic voice comparison is voice disguising, with its more intuitive executions carried out through lowering F0, producing falsetto, or nose pinching. For the first two ways of disguise, analyzing nasal formants could be relevant as normally nasal cavities are not modified by the suspects. A formant detector was developed in order to accurately detect these formants either on nasal consonants or on nasalized vowels from two non-contemporaneous recordings from 30 Spanish male speakers. The recorded samples were compared within the likelihood-ratio framework; the results were validated through the log-likelihood-ratio cost function (Cllr).
- Mr Wang is unfortunately unable to attend the meeting due to unforseen last-minute occurences. This presentation has been cancelled.
- Christin Kirchhübel
- Department of Electronics, University of York
- The effects of Lombard speech on vowel formant measurements
(3pSC10)
- This study analyses the effects of Lombard speech on vowel formant frequencies. Ten male native German speakers were selected from the ‘Pool 2010’ corpus which was recorded at the Bundeskriminalamt (BKA), Germany. Spontaneous speech produced in a neutral setting and Lombard setting, where 80 dB of noise was played through headphones, was analysed. Measurements of F1, F2 and F3 were collected from 10 vowel categories for every speaker in both conditions. The results agree with previous findings in that F1 is consistently higher in the Lombard condition. The effect on F2 is very variable and complex. F3 was less affected than F1 and F2, but changes were present, especially for speakers with low F3s in modal speech. Differences could be observed among vowel categories. Inter-speaker variability was found to be large with respect to the size of increase in F1 and the direction and size of change in F2. The findings are discussed in light of the articulatory changes that have been associated with Lombard speech and the implications for forensic speaker comparison are spelled out.
- Poster
Other presentations:
Tuesday 16 November 2010, 8:05–8:45 (the day before the Special Session)
Coral Garden 1
The following presentations are not part of the Special Session on Forensic Voice Comparison and Foresnic Acoustics, but are related to forensic voice comparison.
Felipe Rolando Menchaca García
IPN
Speaker recognition infrastructure for legal context in Mexico City
(2aSP2, 8:05)
Speaker recognition is actually a mature technology; it is part of the alternatives considered highly reliable for biometric identification, control access of computer services, and so on. However, applications to legal ambient, particularly as part of juridical processes, are still problematic, particularly in an ambient like Mexico City. High-quality infrastructure to determine with high precision whether a voice recording comes from a specific person is a topic which has to be regulated. Expert studies of this kind have to be regulated. This paper is to fulfill the need to develop a local regulation of infrastructure characteristics and standard test methods for this topic. Standard test has to take into account local language mainly.
- This paper was not presented.
- Claudia Rosas1, Jorge Somerhoff2
- 1Instituto de Lingüística y Literatura, Facultad de Filosofía y Humanidades, Universidad Austral de Chile
- 2Instituto de Acústica
- Design and evaluation of reference populations with forensic purposes
(2aSP3, 8:25)
- The present work analyzed the effect of sex, noise, vocal register, and channel on the performance of Batvox, the automatic speech recognizer employed by the Forensic Laboratory of the Investigative Police of Chile. This study transferred and applied the methodology and results of a previous and recently finished research project [Regular Fondecyt No. 1,070,210, funded by the National Commission for Scientific and Technological Research of the Government of Chile] to develop a model for the generation of speech reference populations with forensic purposes, which incorporated a set of dialect and environmental variables not considered in the selected samples currently integrating their databases. The information supplied provided methodological criteria to develop an optimized database for the biometric identification of speakers in Chile and to improve the performance of the systems applying them.
Press Room
The following papers related to the Tutorial and Special Session have been included in the Acoustical Society of America’s World Wide Press Room:
Links
Acoustical Society of America
Iberoamerican Federation of Acoustics – Federación/Federação Iberoamericana de Acústica
Mexican Institute of Acoustics – Instituto Mexicano de Acústica
Forensic Voice Comparison parent website