Design Of Speaker Recognition Algorithm Based On Long Short-term Memory Networks

Posted on:2021-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:Q H Xu

Full Text:PDF

GTID:2518306569997889

Subject:IC Engineering

Abstract/Summary:

PDF Full Text Request

In today’s society,illegal acts such as burglary and impersonation test take place from time to time,which exposes a serious problem,that is,traditional identity authentication methods are easily attacked by criminals,leading to information leakage and huge security risks.With the advent of the era of big data and the rapid development of Internet of Things technology,there are more and more identity authentication methods,and authentication methods are becoming more and more secure.As an individual’s unique biological characteristics,voiceprint meets the current needs for reliability and peculiarities of identity authentication.Voiceprint recognition technology emerged at the historic moment.It supports remote authentication and is more convenient and safe,and has a wide range of application prospects.Although researchers have been studying voiceprint recognition technology,the accuracy of the algorithm has not yet reached a level that makes the public completely at ease,and further improvement is urgently needed.Therefore,this article focuses on the text-independent voiceprint recognition,designed and improved the audio preprocessing in the recognition system and the neural network model used to build the recognition system,and introduced multiple features containing voiceprint information to improve the voiceprint recognition performance of the recognition system.The thesis first preprocesses the audio data in the dataset,which mainly includes:framing,voice activity detection and windowing.Based on the extracted features,the preprocessing stage of the audio is improved.After studying the problem of large prediction errors in the unvoiced segment of the audio when using the Linear Prediction Coefficient(LPC)to restore the audio,the unvoiced segments in the audio are detected and eliminated by using a relatively short-time zero-crossing rate method,which reduces the number of useless audio frames for feature extraction and makes it more suitable for voiceprint recognition.After preprocessing the audio,the paper extracts features of the audio based on the process of human voice production to human ear reception.The LPC feature of the audio is extracted,which can model the human vocal tract;and the Log-Mel spectrum coefficients are extracted,which model the human auditory perception ability.And extract the second-order difference of the LPC feature to obtain the change feature of the LPC,so that it has better dynamic characteristics.Finally,after multiple feature fusions,the most suitable voiceprint feature for voiceprint recognition is obtained.After fully studying the advantages and disadvantages of traditional algorithms and deep learning algorithms in voiceprint recognition tasks,the paper chooses to use long and short-term memory neural networks(LSTM)as the basic framework for voiceprint recognition in this article,and optimizes it into a Bi-directional LSTM neural network.The internet.By comparing the value of the softmax loss function after 100,000 rounds of neural network training under different hidden layers,the two-layer Bi-directional LSTM neural network is finally selected as the final neural network used.Aiming at the problem that the similarity vector makes the training speed slow and the algorithm flexibility is poor,the similarity vector is optimized into a similarity matrix to solve the above problems and further improve the accuracy of voiceprint recognition.The data set used in this article is the VCTK English corpus.The corpus contains109 speakers,each with 300-400 audio files of 5-10 seconds and a sampling rate of48 k Hz.Audio files of 89 speakers were used for training and 20 were used for verification and testing.When the proportion of input speakers is 20%,after 100,000 sets of tests,the average accuracy of voiceprint recognition is 96.985%.

Keywords/Search Tags:

Speaker Recognition, LSTM, LPC, Log-Mel spectrum, Short-term zero crossing rate

PDF Full Text Request

Related items

1	A Short Speech Speaker Recognition Methods And Applications
2	Research On Deep Learning Methods For Use With Speaker Recognition
3	Research On The Speaker Recognition System Under The Short Utterance Based On Deep Learning Theory
4	Tensor-Based High-Order Long-Short Term Memory Network(LSTM)Models
5	Research On Speaker Recognition System Under The Visual C++ 6.0
6	Ultra Short Term Wind Power Prediction Based On ISSA-LSTM
7	Discriminative and generative approaches for long- and short-term speaker characteristics modeling: Application to speaker verification
8	Research On Speaker Recognition Over Short Utterance And Varying Channels
9	Named Entity Recognition Based On LSTM With Hierarchical Residual Connection
10	Research On Pedestrian Crossing Intent Recognition Technology Based On LSTM