| Speaker recognition (SR), which identifies or verifies people by their voice, is regarded as the most natural and convenient one among the methods of biometrics. Most tate-of-the-art features which are used in SR system only reflect the frequency response of speaker's vocal tract. However, they do not take into account their potential influences from glottis' vibration. In this thesis, we first made a study of main points SR system, the current advancements of it, and the recent methods of utilizing glottal features in particular. Then based on it, we proposed our models and algorithms to better utilizing glottal features in SR system. The main contribution of the work are as the followings:1. The model and the compensation algorithm, which could alleviate the cepstral features' influences from glottis' changes, is proposed for the SR system. we have pointed out that the unaffected cepstral features would behavior more discriminative than the traditional ones. And we implement it by using both long-term and short-term glottal features to get the compensated ones. Our algorithm improved the performance in ideal environment. What is more, due to the robustness of glottal features, we also successfully extended the algorithm to the complex multi-channel environment. It showed good performance in the SRMC database with 303 person.2. We proposed our Parallel Gaussian Mixture Models in model level to utilize the correlation information between short-term glottal features and cepstral features. On the assumption that cepstral features is independent from glottal ones, traditional speaker models ignored the potential influences of enlarging the intra-speaker distance. Apart from them, Parallel Gaussian Mixture Models jointly utilized the short-term glottal features and cepstral features. And with the priori knowledge of glottal features, we use one probability model to characterize such relationship, which could improve the performance of the system.3. Glottal Information Based Cepstral Mean Subtraction (GIBCMS), which make use of glottal features in noise environment, is presented here. As is well known, noise/channel would make it different between train and test, which would lead dramatic degradation of performance. And Cepstral Mean Subtraction (CMS) is one of the standard techniques of removing noise/channel difference. Considering the robustness of glottal features against noise/channel, our proposed method divided the speech into a non-linear way based on glottal features, and then built a non-linear model of noise/channel. Without the priori knowledge of frequency response of the channel, GIBCMS greatly increased the correctness of division. In noise YOHO database with SNR at 5dB, the identification rate increased by 18%, which outperformed other Cepstral Mean Subtraction methods.This work is supported by National Natural Science Foundation of P. R. China(60273059), Zhejiang Provincial Natural Science Foundation for Young Scientist of P. R. China (RC01058), Zhejiang Provincial Natural Science Foundation (M603229) and National Doctoral Subject Foundation (20020335025). |