QMOS - a Robust Visualization Method for Speaker Dependencies With Different Microphones
The Journal of Pattern Recognition Research (JPRR) provides an international forum for the electronic publication of high-quality research and industrial experience articles in all areas of pattern recognition, machine learning, and artificial intelligence. JPRR is committed to rigorous yet rapid reviewing. Final versions are published electronically
(ISSN 1558-884X) immediately upon acceptance.
QMOS - a Robust Visualization Method for Speaker Dependencies With Different Microphones
Andreas Maier, Maria Schuster, Ulrich Eysholdt, Tino Haderlein, Tobias Cincarek, Stefan Steidl, Anton Batliner, Stefan Wenhardt, Elmar Nöth
JPRR Vol 4, No 1 (2009); doi:10.13176/11.112 
Download
Andreas Maier, Maria Schuster, Ulrich Eysholdt, Tino Haderlein, Tobias Cincarek, Stefan Steidl, Anton Batliner, Stefan Wenhardt, Elmar Nöth
Abstract
There are several methods to create visualizations of speech data. All of them, however, lack the ability to remove microphone-dependent distortions. We exam- ined the use of Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), and the COmprehensive Space Map of Objective Signal (COSMOS) method in this work. To solve the problem of lacking microphone independency of PCA, LDA, and COSMOS, we present two methods to reduce the influence of the record- ing conditions on the visualization. The first one is a rigid registration of maps created from identical speakers recorded under different conditions, i.e. different microphones and distances. The second method is an extension of the COSMOS method, which performs a non-rigid registration during the mapping procedure. As a measure for the quality of the visualization we computed the mapping error which occurs during the dimension reduction and the grouping error as the aver- age distance between the representations of the same speaker recorded by different microphones. The best linear method in leave-one-speaker-out evaluation is PCA plus rigid registration with a mapping error of 47% and a grouping error of 18%. The proposed method, however, surpasses this even further with a mapping error of 24% and a grouping error which is close to zero.  
JPRR Vol 4, No 1 (2009); doi:10.13176/11.112 | Full Text  | Share this paper: