Application Of Deep Recurrent Neural Networks In Speaker Recognition On Mobile Phones

Posted on:2018-12-02

Degree:Master

Type:Thesis

Country:China

Candidate:Q Liu

Full Text:PDF

GTID:2348330515451682

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Speaker recognition, also known as voiceprint recognition, is one of the most popular biometric identification technology, which identifies the speaker’s identity based on the speaker’s voice. Compared with other biometric characteristics recognition,only speaker recognition supports remote authentication. With the popularity of smart phones, voice collection has become more convenient, the advantages of speaker recognition are more prominent. In the mobile internet environment, users only need to record a short voice through their mobile phone to complete the remote identity authentication, without contacting with special equipment, so the user acceptance is high. Because of these advantages, speaker recognition has been widely concerned and studied.In recent years, deep learning has made remarkable achievements in many areas.Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) are complementary in their modeling capabilities, as CNNs are good at extracting features from images, RNNs are good at temporal modeling. Inspired by this truth, this thesis takes advantage of the complementarity of CNNs and RNNs by combining them into a unified architecture for speaker identification tasks, which is called CDRNN model. The model firstly generate spectrograms of the speaker’s speech data, then use CNN to extracts the speaker’s personality characteristics from spectrograms automatically,finally putting speaker’s features extracted from CNN into Deep RNN for classification.This thesis also does the following work based on the CDRNN model:(1) In order to verify the effectiveness of the CDRNN model for speaker recognition, this thesis does experiment to compare CDRNN model with the classical speaker recognition method on the same speech data set collected from the real environment. The experimental result shows that the recognition accuracy of the CDRNN model is higher than that of the classical method with different speaker number,so the CDRNN model is an effective model.(2) The network model in CDRNN is composed of CNNs and RNNs. In order to study the performance of network model in CDRNN in speaker identification, this thesis compares the network model in CDRNN with other deep network models by experiment from the aspects of speaker feature extraction and speaker modeling ability.The experimental results on the self-constructed data set show that the network model in CDRNN is better than other deep network models.(3) This thesis implements the network model in CDRNN based on the deep learning framework Tensorflow, and transplants the trained network model to the mobile platform, finally implements a mobile speaker recognition prototype system.

Keywords/Search Tags:

Speaker Recognition, Deep Learning, Spectrogram, Deep Recurrent Neural Networks, Convolutional Neural Networks

PDF Full Text Request

Related items

1	Research On Sound Event Recognition Based On Deep Learning
2	Research On Identity Recognition Algorithm Based On Speech Features
3	Research And Implementation Of Lipreading Recognition Based On Deep Learning
4	Deep Learning For Image Captioning
5	Research On Deep Learning Based Video Super-Resolution Algorithm
6	The Design And Implementation Of An Automatic Image Captioning System Based On Deep Neural Networks
7	Research On Automatic Modulation Recognition Based On Deep Learning
8	Research And Implementation Of Sign Language Recognition Algorithm Using Deep Learning Networks
9	Deep Learning-Based Methods For Text Detection And Recognition In Natural Images
10	Research On 3D Model Retrieval Technology Based On Deep Learning