| Action segmentation is a challenging task in the field of computer vision,which aims to segment continuous actions into sub-action segments with their starting and ending time points in an untrimmed long video.Currently,the mainstream methods for action segmentation can be divided into two types according to the input data format: feature-based action segmentation using I3 D dual-stream features and skeleton-based action segmentation directly extracting action features from skeleton keypoint information.Due to some limitations of I3 D feature extraction observed in experiments,this paper focuses on skeleton-based action segmentation task and researches three methods specifically designed for skeleton data.Firstly,a method based on spatio-temporal graph convolution and dynamic time warping is researched for action segmentation,which is mainly designed for Tai Chi actions.Specifically,this method consists of two steps.Firstly,a spatio-temporal graph convolutional network is trained on a self-made Tai Chi skeleton dataset,and then the network is used for action classification with a sliding window input to obtain an initial action segmentation structure.Secondly,hand-crafted features are designed to represent Tai Chi actions based on the position information of skeleton key points.Then,a time-varying alignment algorithm is applied to compare the reference action feature curve with the sample action curve based on the initial segmentation result.Finally,the action boundaries are redefined according to the comparison result to generate the final segmentation result.The feasibility of this method for Tai Chi action segmentation task is demonstrated through experiments and analysis on a self-made Tai Chi dataset.Secondly,a frame-level action segmentation method based on spatial graph convolution and cascaded networks is researched.This method combines spatial graph convolution with multi-stage cascaded temporal convolution,which enables the network model to capture the spatial motion information and long-term temporal dependency of continuous actions in skeleton data.The model was experimented and analyzed on the PKUMMDv2 and LARa datasets,and the results of the experiments validated the effectiveness of this method.Finally,a frame-level action segmentation method based on dual dilated temporal convolution is researched,which improves the temporal convolution structure based on the previous method using spatial graph convolution and cascaded networks.Specifically,dual dilated residual layers are introduced in the initial stage of the cascaded networks,which enables the model to achieve higher frame accuracy.This method is tested and analyzed on PKU-MMDv2,LARa,and a self-made Tai Chi dataset,and compared with advanced methods.The experimental results demonstrate that the proposed method achieved outstanding performance. |