| With the rapid development of China’s economy and science and technology,driverless technology is gradually applied to people’s life.In view of the special situation of complex roads and environmental factors in China,it is difficult to meet practical demand to rely on fixed traffic signal lights.In addition,traffic police gesture recognition has always been a hot field in computer vision,which is loved by many researchers.Therefore,the research of traffic police gesture recognition based on dynamic has certain practical significance.In the process of studying gesture recognition of traffic police,the accuracy of gesture recognition of traffic police is easy to be affected by complex environment,light intensity change and occlusion.To solve these problems,this thesis proposes a method based on spatio-temporal feature information fusion to study gesture recognition of dynamic traffic police.In this thesis,convolution pose machine(CPM)is used to study the gesture recognition of traffic police.In view of the problems existing in the research process of CPM network,corresponding improvements are made.Firstly,the feature extraction module is optimized.Residual structure,Channel Split and Channel Shuffle are used in the feature extraction module.The improved feature extraction module can extract image features better.Secondly,CPM is composed of multiple stages,each stage will generate human key points,and the generation of human key points in the later stage will be affected by the previous stage.Therefore,the generation of human key points in the first phase of CPM network is extremely important.In this thesis,an improved Inception4 d structure is added in the first stage of CPM.Inception4 d is a typical multiscale feature fusion network.CPM network can use multi-scale processing feature information,so as to locate the key points of the human body more accurately.Then,in order to get better time information,a gate recurrent unit(GRU)module based on attention mechanism is constructed.By assigning a corresponding weight score to each frame of video,the weight size can represent the importance of each frame of video.According to the weight size,we can pay more attention to the key frames and reduce the impact of some redundant frames on the experiment,so that GRU can use limited resources to obtain better time information.Finally,the gesture recognition of traffic police is carried out by fusing spatio-temporal feature information.In this thesis,human body key point training is conducted in AI Challenger dataset,and traffic police gesture recognition is conducted in China traffic police gesture dataset.The accuracy of its traffic police gesture recognition has reached 93.7%,which is 2.95%higher than that before the network improvement.In particular,the accuracy of gesture recognition of traffic police in some complex environments has also been improved to a certain extent,which also shows that the improved network model in this thesis has better advantages. |