| Dynamic gesture recognition plays an important role in Augmented Reality,HumanComputer Interaction,Sign Language Recognition,etc.In recent years,deep learning has provided new vitality for pattern recognition and computer vision.However,the current dynamic gesture recognition algorithms based on deep learning still face the following problems:(1)The variability of dynamic gesture appearance and the randomness of duration make recognition more difficult.(2)Traditional regular algorithms and data augmentation cannot effectively solve the overfitting problem when applied to spatio-temporal models and action data.To address the above problems,this thesis carried out relevant research and experiments,the main contents of which are as follows:1.To address problem(1),the thesis proposes an asynchronous spatio-temporal feature extraction method.Firstly,we construct an asynchronous spatio-temporal feature extraction module by a lightweight 3D convolutional network.This module can extract gesture features which have multi-scale spatio-temporal characteristics.That ensures the recognition accuracy of gestures with different appearance sizes and temporal rates.Then,we improve the Long ShortTerm Memory network,and use it to learn the stable long-term features from the short-term asynchronous spatio-temporal features.Finally,we fuse the spatio-temporal features of each time step for the final dynamic gesture recognition.2.To address problem(2),the thesis proposes a spatio-temporal drop regularization method,which called Label-Guided Spatio-Temporal Drop strategy(LGST-Drop).It can not only structure the drop neurons at the frame level,but also regularize the motion information in the channel and temporal dimensions.More over,the drop mask of LGST-Drop is generated by the temporary labels guided by the network,thus reducing the randomness of selecting drop regions and improving the stability of the spatio-temporal regularization process.Through experimental comparison with other mainstream methods,the results demonstrate that the proposed model based on multi-temporal asynchronous spatio-temporal features can significantly improve the gesture recognition performance and show stable results on several typical data sets.In addition,the proposed LGST-Drop method is applied to a variety of recognition networks and experimentally compared with other typical regularization algorithms.The results show that the LGST-Drop algorithm is very competitive. |