| Human actions is one of the most intuitive ways to express their true intentions,and accurate recognition of actions can help computers accurately understand the information conveyed by humans,and further support human-computer interaction systems to achieve a more immersive experience.With the continuous maturity of deep learning technology,motion technology detection based on motion video is one of the key applications of computer vision.Nowadays,yoga is a fashionable and convenient aerobic exercise that improves the body’s immune function and relieves anxiety.Usually people choose to search the Internet for resources to learn yoga on their own,but non-standard postures can cause joint damage,contrary to the original purpose of exercise.The significance of studying yoga movement recognition is to improve its recognition accuracy,and use existing resources to combine artificial intelligence with sports to promote the development of intelligent sports.In order to improve the accuracy of action recognition in yoga videos,based on the traditional two-stream convolutional network,this thesis combines the residual structure and proposes a spatial-temporal fusion residual network(STF-ResNet)to solve the problem of yoga action recognition in complex scenes.The main work of this thesis is as follows:(1)Acquisition and production of datasets.Currently,public datasets for motion recognition include HMDB51 and UCF101,which are based on daily human actions,and there is no public dataset on basic yoga action.In this thesis,we collect as well as process yoga action videos through multiple channels to create a yoga action dataset.(2)A spatial-temporal residual fusion network(STF-ResNet)is proposed.By converting the RGB and optical streams of the target region data and feeding them into the STF-ResNet network to extract video spatial and temporal features,the spatial-temporal features are complemented by mixing the spatial and temporal stream features with residuals,and the information loss of the high-level features is compensated by the low-level features;the convolutional block attention module(CBAM)is added before the mixing,the yoga action characteristics are again filtered from both channel and space dimensions.Finally,through experimental analysis,the model in this thesis improves the average recognition accuracy by 6.3% compared with the traditional two-stream convolutional neural network model,in addition,the method also shows good performance on public datasets.(3)For the study of STF-ResNet network model,this thesis designs and implements a yoga action recognition system.The system combines the functions of algorithm analysis as well as data analysis to detect the existing and simulated datasets,and can effectively identify yoga behaviors.By uploading videos of basic yoga action,it provides services such as action recognition and evaluation for yoga practitioners to help them find deficiencies and improve their actions,which is valuable in practical applications. |