Research On Technology Of Real-time Lip Reading Based On Kinect 3D Camera

Posted on:2018-07-05

Degree:Master

Type:Thesis

Country:China

Candidate:S Yue

Full Text:PDF

GTID:2348330542485010

Subject:Software engineering

Abstract/Summary:

Lip-reading technology,not only can be used as a supplementary means of speech recognition in multi-modal speech recognition systems,to enhance the robustness and accuracy of the system,and to break through the limitations of its application scenarios;but also can be applied to assist hearing-impaired people with normal communication and language function recovery,and can be used as a new type of coding for some specific scenes.Traditional lip-reading research based on two-dimensional videos has made great progress.With the development of three-dimensional imaging technology,lip reading research has a broader development prospect.This paper aims to study the real-time lip-reading technology using the Kinect sensor to acquire 3D data of speakers’ faces.This paper mainly includes data acquisition module,lip detection and localization module,feature extraction module and speech recognition module.Firstly,corpus data are collected by Kinect sensor.Secondly,the lip-moving three-dimensional model of human face is constructed in the data preprocessing phase of face 3D coordinate information acquired based on Kinect Face Tracking SDK.And according to the correspondence between CANDIDE-3 and MPEG-4 standard face models,the locations of the 18 feature points in the lip region can be further determined.In addition,19 feature points around the lip region are added together as the Region of Interest(ROI).Then,for the 37 feature points in ROI,4 kinds of 3D spatial features are extracted,which are coordinate vector features formed by the coordinate origin and these feature points separately,geometrical proportionality features calculated from the lip contour,lip angle features selected based on the KNearest Neighbors(KNN)classification algorithm,spatial angle features on the basis of the selection from the standard face model and the customization under the lip motion characteristics.These features can express the lip movement information more comprehensively,and can reduce the impact of the posture and orientation of the speakers effectively during data acquisition phase.Then,the four spatial features are normalized by piecewise linear interpolation method,and further feature selection is made by KNN classification algorithm to obtain the most representative features,which can be combined to form the final lip-reading feature.Finally,KNN classification algorithm and Ensemble Learning methods are used in the classification experiments.The KNN classification algorithm verified the high efficiency and good instantaneity of the spatial lip-reading feature.And Compared with Bagging ensemble learning method,KNN ensemble learning method achieves better classification accuracy,and is much more suitable for real-time lip-reading system.

Keywords/Search Tags:

Lip Reading, Spatial Angle Feature, KNN Ensemble Learning Method, Kinect Sensor

Related items

1	Research On 2D Localization Method Of DAS Vibration Sources Based On Ensemble Learning Model
2	Mammographic Mass Detection Based On Multi-scale Spatial Pyramid Ensemble
3	Design And Implementation Of Lip Reading System Based On Deep Learning
4	Research On Technology Of Lip Reading Fused Physiological Information
5	Method And Application Of Human Posture Recognition Based On Kinect
6	Research On Semi-supervised Few-shot Learning Method Based On Ensemble Learning Strategy
7	Research On Classifier-selection-based Ensemble Learning Algorithm
8	Research And Application Of Network Intrusion Detection Method Based On Feature Selection And Ensemble Learning
9	Research On The Prediction Of Market Index Based On Ensemble Learning And Multi-Angle Analysis
10	Research Of Extreme Learning Machine Based On Ensemble Method