| Speaker recognition, also known as voiceprint recognition, is one of the most popular biometric identification technology, which identifies the speaker’s identity based on the speaker’s voice. Compared with other biometric characteristics recognition,only speaker recognition supports remote authentication. With the popularity of smart phones, voice collection has become more convenient, the advantages of speaker recognition are more prominent. In the mobile internet environment, users only need to record a short voice through their mobile phone to complete the remote identity authentication, without contacting with special equipment, so the user acceptance is high. Because of these advantages, speaker recognition has been widely concerned and studied.In recent years, deep learning has made remarkable achievements in many areas.Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) are complementary in their modeling capabilities, as CNNs are good at extracting features from images, RNNs are good at temporal modeling. Inspired by this truth, this thesis takes advantage of the complementarity of CNNs and RNNs by combining them into a unified architecture for speaker identification tasks, which is called CDRNN model. The model firstly generate spectrograms of the speaker’s speech data, then use CNN to extracts the speaker’s personality characteristics from spectrograms automatically,finally putting speaker’s features extracted from CNN into Deep RNN for classification.This thesis also does the following work based on the CDRNN model:(1) In order to verify the effectiveness of the CDRNN model for speaker recognition, this thesis does experiment to compare CDRNN model with the classical speaker recognition method on the same speech data set collected from the real environment. The experimental result shows that the recognition accuracy of the CDRNN model is higher than that of the classical method with different speaker number,so the CDRNN model is an effective model.(2) The network model in CDRNN is composed of CNNs and RNNs. In order to study the performance of network model in CDRNN in speaker identification, this thesis compares the network model in CDRNN with other deep network models by experiment from the aspects of speaker feature extraction and speaker modeling ability.The experimental results on the self-constructed data set show that the network model in CDRNN is better than other deep network models.(3) This thesis implements the network model in CDRNN based on the deep learning framework Tensorflow, and transplants the trained network model to the mobile platform, finally implements a mobile speaker recognition prototype system. |