3D Convolutional Neural Networks Based Speaker Identification And Authentication

Posted on:2020-01-06

Degree:Master

Type:Thesis

Country:China

Candidate:J G Liao

Full Text:PDF

GTID:2428330623463752

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Human biological features such as fingerprints,faces and iris are widely used in the field of identity recognition and authentication because of their convenience and security,which greatly facilitates people's lives.The latest research shows that lip features include the speaker's unique lip physiological structure information and speech habits information,which can be used to speaker identification and authentication.In addition,lip features can be used as an effective complement to other biological features.For example,face recognition combined with lips characteristic enhances living body detection effect,and speech recognition combined with lips characteristic improves the recognition effect in noisy environment,etc.Therefore,it is of great significance to study the extraction and application of lip features.The difficulty of speaker recognition and authentication based on lip region lies in the extraction of lip features,which include both static information of lip appearance and dynamic information of lip deformation features during speech.Traditional methods,such as lip contour extraction,texture feature extraction and sparse coding,can extract the identity information of the speaker,but the effect on the speaker in different lighting,different angles and different distances is unsatisfactory.In this letter,a novel end-to-end method based on 3D convolutional neural network(3DCNN)is proposed to extract discriminative spatiotemporal features from raw lip video streams.In our approach,the lip video is first divided into a series of overlapping clips.For each clip,the lip-characteristics network is proposed to characterize the minutiae of the lip region and its movement.Finally,the entire lip video is represented by a set of sub-features corresponding to each clip in it.Experiments have been performed on a dataset with 200 speakers and the proposed method achieves high identification accuracy of 99.18% and very low authentication error(HTER of 0.15%).Compared with several state-of-the-art methods,our approach achieves better performance and higher robustness against variations caused by different speaker's pose and position.In addition,this method also achieves satisfactory results on the natural scene dataset VoxCeleb2 which contains nearly 3000 people.

Keywords/Search Tags:

Visual speaker identification, Visual speaker authentication, 3DCNN, Lip feature

PDF Full Text Request

Related items

1	Visual Speaker Authentication Research Based On Dynamic Lip Details
2	Deep Learning Based Speaker Authentication With Random Password
3	Research And Implementation Of Multi Speaker Recognition Technology Based On Deep Learning
4	Any Text Speaker Recognition System
5	Multi-speaker Tracking Method Based On Audio-visual Feature Fusion Under Intelligent Environment
6	Research On Speaker Recognition Over Short Utterance And Varying Channels
7	Research On Acceleration Method Of Speaker Identification
8	A Research On Speaker Recognition Algorithm And Speaker Identification System Implementation
9	Research On Feature Extraction And Robust Technology For Speaker Identification
10	Research On Feature Extraction And Model Algorithm For Speaker Recognition