Due to its special merits of flexibility, economy and accuracy, speaker recognition technology has a broad application future in biometrics security field. However, state-of-the-art techniques of speaker recognition have performed well under ideal conditions, the practical results are distinctly descended. Thus the problem of improving the system robustness has turned into the most active research filed.We can improve the robustness of speaker recognition system in many different ways, because this problem relates to every parts of the system. Some essential solving approaches include the noisy speech detection, robust feature, modeling techniques under noise with limit training data and channel distortion compensation, etc. This thesis has systematically investigated existing works from other colleagues, and proposed some novel approaches:1) Dynamic Multi-Feature detection approach based on Reliability Measure (RM-DMF): This approach can dynamically choose the most proper feature parameter to detect speech signal under noise and adjust the threshold automatically, depending on the reliability measure. The experimental results under different noise show that the RM-DMF can detect speech signal more accurately than the existing three methods;2) Pitch Detection Algorithm with Multi-Phase Filter Bank (MPFB-PDA): We adopt for the first time the multi-phase filter bank technique used in audio compression into pitch detection, and a novel voiced/unvoiced judgment is also used. The comparing experimental results prove that MPFB-PDA can detect the endpoint of voiced/unvoiced speech more accurately than other common methods, and the computational cost is lower (reduce multiplications more than 80%) because of the fast algorithm;3) An improving Pitch/Energy Contour feature (PEC): Combined with the MPFB-PDA, the PEC extends the traditional prosodic features and makes it possible to adopt in text-independent situation. The results of text-independent speaker identification experiments outcome that the recognition rate can increase 4.4% when cooperate with PEC;4) Multi-Eigen-Space model technique based on Regression Classes (RC-MES): RC-MES makes up the shortcoming of traditional Eigen-space approach [4.6], which ignores the phoneme-differences and confuses it with speaker-differences. The results in table 4.4 show that, when training data only contains 10s speech, the RC-MES can improve the recognition rate from 90.8% to 95.2%;5) A novel Speech-Background Integrated Model (SBIM): This technique combines the merits of RC-MES and the normal integrated model approach, and solves better the problem of training speaker model under noise with limit training data. The speaker identification experiments show that SBIM can improve the performance under different kinds of noise with 20s training speech;6) An Extended Feature Mapping approach (EFM): EFM remedies the mapping relations between feature vectors and Gaussian components of normal feature mapping, and improves the distortion compensation performance. The corresponding speaker verification experiments come out with the more robust results, and the equal error rate reduced from 9.86% to 9.62%;7) Nonlinear Feature Mapping based on Radial Basis Function (RBF-NFM): We adopt the RBF network to cope with the nonlinear distortion problem existed in speech transmission. When combinedwith GMM, the training process of RBF can be reduced. The experimental results show that RBF-NFM can compensate the nonlinear distortion better and the equal error rate reduced from 10.98% to 9.69%. |