| The gesture is an important way of human communication other than language and is also a common communication tool in human-machine interaction.Traditional wearable gesture recognition methods require complex and expensive devices,which need to be properly worn and configured by the user,causing significant inconvenience in the process of use.In addition,the earlier vision-based static gesture recognition technology relies on the external shape and contour information of the hand for gesture recognition,which supports limited gesture categories and is difficult to adapt to complex environmental scenarios.Dynamic gestures,with richer and more accurate information,can broaden the boundary of human-machine interaction,enabling humans to control machines in a more natural and convenient way,while enabling machines to directly understand human intentions and better serve humans.Therefore,the technologies related to vision-based dynamic gesture recognition have become a hot research topic in human-computer interaction.In this paper,dynamic gesture recognition technology based on the monocular camera is studied.Among them,a general gesture model combining the features of the human skeleton and gesture composition(head,left hand,and right hand)is established based on the analysis of human skeleton features and gesture composition.Secondly,a high precision extraction of spatial features of human gestures is achieved by drawing on the ideas of the high-resolution network,joint embedding,multi-scale feature extraction,and anchor box.Finally,the LSTM network is introduced to extract temporal features of dynamic gestures,and a dynamic gesture recognition machine is designed to fuse spatial background and temporal features.The network is trained on the public dataset ‘AI-challenger’ and the homemade dataset ‘gesture components’,and applied to traffic police gesture recognition.The main work and results of this paper are as follows.(1)Combining the spatial features of the human skeleton with the apparent features of gesture components,a general human gesture model is proposed,which is resistant to interference and easily portable for various gesture interaction scenarios.(2)By introducing a high-resolution backbone network to construct a human joint point extraction network based on Gaussian heatmap,and using technologies(such as multi-scale feature extraction,etc)to construct an apparent feature extraction network of human gesture components.The high accuracy extraction of spatial features,such as skeleton length,inter-skeleton angle,and encoding of gesture components apparent feature,is realized on human interactive gestures.(3)Introducing LSTM networks to extract the temporal features of dynamic gestures.On this basis,a general dynamic gesture recognition architecture is designed.Furthermore,the above research results are applied to traffic police command gesture recognition,and a traffic police gesture recognition machine is designed and implemented on a mobile device.Experiments are carried out on the public traffic police gesture recognition dataset,and the accuracy rate reached 98.72%.At the same time,it has stronger anti-interference and adaptability to illumination and complex background changes. |