[Annonce] 3 articles pour IEEE - Odyssey 2008 - Annonce

3 Oct 2007

We are pleased to announce that the following papers have been accepted 
to The Speaker and Language Recognition Workshop (IEEE - Odyssey 2008 --
http://www.speakerodyssey.com).

** Reda Dehak, Najim Dehak, Patrick Kenny, Pierre Dumouchel. Kernel 
Combination for SVM Speaker Verification.

http://publis.lrde.epita.fr/200709-ODYSSEY-A

We present a new approach for constructing the kernels used to build 
support vector machines for speaker verification. The idea is to 
construct new kernels by taking linear combination of many kernels such 
as the GLDS and GMM supervector kernels. In this new kernel combination, 
the combination weights are speaker dependent rather than universal 
weights on score level fusion and there is no need for extra-data to 
estimate them. An experiment on the NIST 2006 speaker recognition 
evaluation dataset (all trial) was done using three different kernel 
functions (GLDS kernel, linear and Gaussian GMM supervector kernels). We 
compared our kernel combination to the optimal linear score fusion 
obtained using logistic regression. This optimal score fusion was 
trained on the same test data. We had an equal error rate of $\simeq 
5,9\%$ using the kernel combination technique which is better than the 
optimal score fusion system ($\simeq 6,0\%$).

** Reda Dehak, Najim Dehak, Patrick Kenny, Pierre Dumouchel. Comparison 
Between Factor Analysis and GMM Support Vector Machines for Speaker 
Verification.

http://publis.lrde.epita.fr/200709-ODYSSEY-B

We present a comparison between speaker verification systems based on 
factor analysis modeling and support vector machines using GMM 
supervectors as features. All systems used the same acoustic features 
and they were trained and tested on the same data sets. We test two 
types of kernel (one linear, the other non-linear) for the GMM support 
vector machines. The results show that factor analysis using speaker 
factors gives the best results on the core condition of the NIST 2006 
speaker recognition evaluation. The difference is particularly marked on 
the English language subset. Fusion of all systems gave an equal error 
rate of 4.2% (all trials) and 3.2% (English trials only).

** Patrick Kenny, Najim Dehak, Reda Dehak, Vishwa Gupta, Pierre 
Dumouchel. The Role of Speaker Factors in the NIST Extended Data Task.

http://publis.lrde.epita.fr/200709-ODYSSEY-C

We tested factor analysis models having various numbers of speaker 
factors on the core condition and the extended data condition of the 
2006 NIST speaker recognition evaluation. In order to ensure strict 
disjointness between training and test sets, the factor analysis models 
were trained without using any of the data made available for the 2005 
evaluation. The factor analysis training set consisted primarily of 
Switchboard data and so was to some degree mismatched with the 2006 test 
data (drawn from the Mixer collection). Consequently, our initial 
results were not as good as those submitted for the 2006 evaluation. 
However we found that we could compensate for this by a simple 
modification to our score normalization strategy, namely by using 1000 
z-norm utterances in zt-norm.

Our purpose in varying the number of speaker factors was to evaluate the 
eigenvoiceMAP and classicalMAP components of the inter-speaker 
variability model in factor analysis. We found that on the core 
condition (i.e. 2?3 minutes of enrollment data), only the eigenvoice MAP 
component plays a useful role. On the other hand, on the extended data 
condition (i.e. 15?20 minutes of enrollment data) both the classical MAP 
component and the eigenvoice component proved to be useful provided that 
the number of speaker factors was limited. Our best result on the 
extended data condition (all trials) was an equal error rate of 2.2% and 
a detection cost of 0.011.

-- 
Daniela Becker