Statistical Models in Forensic Voice Comparison
Handbook of Forensic Statistics
This webpage provides material related to:
Abstract
|
Table of Contents
1 Introduction
2 Feature extraction
2.1 Mel-frequency cepstral coefficients (MFCCs)
2.2 Deltas and double deltas
2.3 Voice-activity detection (VAD) and diarization
3 Mismatch compensation in the feature domain
3.1 Cepstral-mean subtraction (CMS) and Cepstral-mean-and-variance normalization (CMVN)
3.2 Feature warping
4 GMM-UBM
4.1 Training the relevant-population model (the UBM): Expectation maximization (EM) algorithm
4.2 Training the known-speaker model: Maximum a posteriori (MAP) adaptation
4.3 Calculating a score
4.4 Remarks regarding UBM training data
5 i-vector PLDA
5.1 i-vectors
5.2 i-vector domain mismatch compensation (LDA)
5.3 PLDA
6 DNN-based systems
6.1 DNN senone posterior i-vector systems
6.2 Bottleneck-feature based systems
6.3 DNN speaker embedding systems (x-vector systems)
7 Score-to-likelihood-ratio conversion (calibration)
8 Validation
8.1 List of published validation studies
9 Conclusion
10 Acknowledgments
11 Appendix A: Mathematical details of T matrix training and i-vector extraction
12 References
Legal references
Preprint
Color Figures
3D Figures
Matlab fig files for Figures 05 and 08
Video Lectures
Lectures by Geoffrey Stewart Morrison, Aston University
This webpage is maintained by Geoffrey Stewart Morrison.
Last update 2020-11-13