| Voiceprint recognition,also known as speaker recognition,is a technology to identify a speaker through speech.As a kind of biometrics,voiceprint has the same rich application scenarios as fingerprint and face,such as identification,intelligent artificial interaction,criminal investigation and troubleshooting.Voiceprint recognition system mainly includes three aspects: feature extraction,model construction and scoring decision.The selection of voiceprint features greatly affects the performance of voiceprint recognition.Therefore,how to select one or more features to represent a speaker’s identity is a problem studied by many scholars.Based on this,this paper carries out the research and application of voiceprint recognition technology based on feature fusion.Considering that the bidirectional long-short memory network can more effectively extract the associated information between speech sequences,this paper uses the bidirectional long-short memory network as the experimental model.By learning a variety of voiceprint features,it is found that each feature has its unique advantage,and fusing multiple features can more effectively represent the speaker’s identity.Therefore,this paper proposes a feature fusion method based on the embedding mechanism.This method solves not only the problem that a single feature cannot effectively represent the speaker’s identity information,but also the problem of feature redundancy caused by traditional feature stitching and the problem of training time consumption caused by too large feature parameters.In addition,we design and developed a voiceprint lock APP,which is mainly used to lock the voiceprint of mobile phone software.The specific research contents are as follows:(1)A feature fusion method based on an embedding mechanism is proposed,which solves the problem that a single feature cannot completely represent the speaker’s identity information,and does not lead to too large feature dimensions.In the experiment,the long-short memory network and the bidirectional long-short memory network are used as the network models,and the generalized end-to-end loss is used as the loss function.Since Mel-frequency Cepstral Coefficient(MFCC)is obtained by discrete cosine transform(DCT)based on filter bank Coefficient(Fbank),which has specific relevance,we select Fbank and MFCC for feature fusion.The 1st to 13 th dimensions of 40-dimensional Fbank features are replaced by MFCC features to obtain 40-dimensional fusion features.Through experiments under two models,the experimental results show that under the two models,the feature fusion method based on the embedding mechanism has improved the performance of voiceprint recognition compared with the single feature(separate MFCC and Fbank),and the bidirectional long-short memory network model performs better.(2)In order to improve the performance of voiceprint recognition,we select different features for experiments.Inverse Mel-frequency Cepstral Coefficient(IMFCC)adopts inverse mel filter bank,which has the opposite characteristics with MFCC and is complementary to a certain extent.MFCC is obtained by DCT based on Fbank,which is of great relevance.Therefore,this paper believes that IMFCC and Fbank also have a certain degree of complementarity.Potential information and complementary information of speakers can be obtained through feature fusion.Replace 27-39 dimensions of 40-dimensional Fbank features with IMFCC features to get 40-dimensional fusion features.The experiment adopts a bidirectional long-short memory network and a generalized end-to-end loss model.At the same time,it is compared with the stitching feature.The experimental results show that compared with the separate IMFCC feature,Fbank feature and stitching feature,the performance of voiceprint recognition is improved by 50%,24.92% and 57.12%respectively,and the equal error rate is 2.50%.(3)A lightweight model Res CNN-light is proposed,which reduces the parameters of the Res CNN model from 24 M to 5M,and improves the performance of voiceprint recognition while reducing the model parameters.Firstly,the architecture of Res CNN model is studied,and the last two residual networks are removed.Through experiments on VCTK and Libri Speech data-sets,the advantages of the removed Res CNN-light model were verified,and the EER was reduced to 3.88% at the lowest.(4)The voiceprint lock APP is designed and developed.The software development platform adopts Android Studio and the development language is Java.The main functions of the software include voiceprint registration,voiceprint authentication and application locking.To start the software for the first time,voiceprint registration(training)is required.During the registration process,five strings of random numbers need to be read aloud.After registration,you can view the voiceprint model ID or delete the existing speaker model to retrain.Voiceprint authentication can verify whether the current speaker is consistent with the speaker in the current model.If their similarity reaches a certain threshold,they are recognized as the same speaker.To lock an app,you need to open access to the phone,read the list of apps in the phone,and then lock the selected app with voiceprint.The process of unlocking is the same as voiceprint authentication.Only the same speaker can unlock. |