Font Size: a A A

Design Of Speaker Recognition Algorithm Based On Long Short-term Memory Networks

Posted on:2021-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q H XuFull Text:PDF
GTID:2518306569997889Subject:IC Engineering
Abstract/Summary:PDF Full Text Request
In today's society,illegal acts such as burglary and impersonation test take place from time to time,which exposes a serious problem,that is,traditional identity authentication methods are easily attacked by criminals,leading to information leakage and huge security risks.With the advent of the era of big data and the rapid development of Internet of Things technology,there are more and more identity authentication methods,and authentication methods are becoming more and more secure.As an individual's unique biological characteristics,voiceprint meets the current needs for reliability and peculiarities of identity authentication.Voiceprint recognition technology emerged at the historic moment.It supports remote authentication and is more convenient and safe,and has a wide range of application prospects.Although researchers have been studying voiceprint recognition technology,the accuracy of the algorithm has not yet reached a level that makes the public completely at ease,and further improvement is urgently needed.Therefore,this article focuses on the text-independent voiceprint recognition,designed and improved the audio preprocessing in the recognition system and the neural network model used to build the recognition system,and introduced multiple features containing voiceprint information to improve the voiceprint recognition performance of the recognition system.The thesis first preprocesses the audio data in the dataset,which mainly includes:framing,voice activity detection and windowing.Based on the extracted features,the preprocessing stage of the audio is improved.After studying the problem of large prediction errors in the unvoiced segment of the audio when using the Linear Prediction Coefficient(LPC)to restore the audio,the unvoiced segments in the audio are detected and eliminated by using a relatively short-time zero-crossing rate method,which reduces the number of useless audio frames for feature extraction and makes it more suitable for voiceprint recognition.After preprocessing the audio,the paper extracts features of the audio based on the process of human voice production to human ear reception.The LPC feature of the audio is extracted,which can model the human vocal tract;and the Log-Mel spectrum coefficients are extracted,which model the human auditory perception ability.And extract the second-order difference of the LPC feature to obtain the change feature of the LPC,so that it has better dynamic characteristics.Finally,after multiple feature fusions,the most suitable voiceprint feature for voiceprint recognition is obtained.After fully studying the advantages and disadvantages of traditional algorithms and deep learning algorithms in voiceprint recognition tasks,the paper chooses to use long and short-term memory neural networks(LSTM)as the basic framework for voiceprint recognition in this article,and optimizes it into a Bi-directional LSTM neural network.The internet.By comparing the value of the softmax loss function after 100,000 rounds of neural network training under different hidden layers,the two-layer Bi-directional LSTM neural network is finally selected as the final neural network used.Aiming at the problem that the similarity vector makes the training speed slow and the algorithm flexibility is poor,the similarity vector is optimized into a similarity matrix to solve the above problems and further improve the accuracy of voiceprint recognition.The data set used in this article is the VCTK English corpus.The corpus contains109 speakers,each with 300-400 audio files of 5-10 seconds and a sampling rate of48 k Hz.Audio files of 89 speakers were used for training and 20 were used for verification and testing.When the proportion of input speakers is 20%,after 100,000 sets of tests,the average accuracy of voiceprint recognition is 96.985%.
Keywords/Search Tags:Speaker Recognition, LSTM, LPC, Log-Mel spectrum, Short-term zero crossing rate
PDF Full Text Request
Related items