| With the continuing development of visual technology,it has become a popular project to analyze sports videos by using vision perception technology.In the field of sports video analysis,the accuracy of human pose estimation and action detection is of great significance for the subsequent analysis process.However,the problems of blocking each other,complex pose and fast motion in sports videos restrict the application of existing human pose estimation algorithms.The innovations of this thesis are as follows:(1)At present,the studies on human pose estimation based on deep neural networks mainly focus on the design of network,but pay less attention to the modeling of keypoints and human body structure information,although the existing studies achieve good results,however,from the perspective of human structure modeling,the existing human pose estimation methods still have limitations.Therefore,to address the shortcomings of existing human pose estimation algorithms,this thesis proposes the context attention-based keypoint extraction network(CAKENet).The CAKENet introduces the keypoint context attention mechanism(KCAM)to model the dependencies between keypoints,thus effectively reducing the false detection of difficult samples.In order to generate more difficult samples,we propose the Self-Data Augmentation(SDA).The proposed CAKENet achieves a human pose estimation accuracy of 79.5%on the COCO dataset.(2)In order to realize the application of human pose estimation algorithms on mobile devices and embedded platforms,the recent researches on lightweight human pose estimation network mainly focus on designing more lightweight networks.However,the accuracy of these methods is too low to meet the needs of practical applications.This thesis proposes an efficient lightweight human pose estimation algorithm HRNeXt by replacing the normal convolution with ResNeXt.HRNeXt achieves a pose estimation accuracy of 78.2%on the COCO dataset.It efficiently balances the relationship between accuracy and network complexity.(3)To analyze sports video using pose estimation algorithm,the first step is to extract key frames from the video to eliminate redundant and repetitive data and improve the analysis efficiency.Due to the high similarity of visual features of continuous action,there are great difficulties in the detection of continuous action.In addition,the inconsistency of the shooting perspective caused by camera movement makes the accuracy of the traditional action detection algorithms relatively low.To address these problems,this thesis proposes an action detection network incorporating spatial and temporal information(STADNet)by combining the spatial information extraction network Efficient Net and the temporal information extraction network bidirect gated recurrent unit(GRU).The action detection accuracy of STADNet on the dataset Golf DB is 74.9%.(4)Based on HRNeXt and STADNet algorithms,this thesis designs a sports video analysis system based on human pose estimation and action detection algorithms,which allows scientific evaluation of athletes’ actions.The proposed methods in this thesis effectively improve the recognition accuracy of human pose when visual representation is missing,reduce the number of parameters and computation of the network,and improve the detection accuracy of actions with similar visual features. |