| In recent years,computer vision technology has been developing rapidly,and the action detection technology has attracted a lot of attention.The human skeleton has a strong adaptability to complex backgrounds and environmental changes,so it can be used widely in the fields of intelligent surveillance.In order to obtain the motion features from the skeleton sequences that represent human behavior,it is crucial to model the changes in the temporal and spatial dimensions of the sequences correctly and rationally.This thesis solves the problem of insufficient expression of spatio-temporal features in skeletal sequences and carry out in-depth research on action detection algorithms and applications based on hybrid models.The main research contents and contributions of this thesis are summarized as follows:(1)To solve the problems of feature over-smoothing and insufficient attention to behavioural key frames faced by the current spatio-temporal feature modelling of skeletal sequences,this thesis proposes an improved key frame spatio-temporal graph attention model.The key frame extraction module can obtain frames with high motion information from a given input skeleton data,so that the model can focus on more important spatiotemporal frames and solve the problem of insufficient attention to behavioural key frames;the graph attention module is used in the spatio-temporal feature extraction module to effectively learn the spatial dependency between adjacent nodes to prevent feature oversmoothing.Experiments conducted on the skeleton behaviour recognition datasets NTURGBD 60 and Kinetics-Skeleton 400 verified that the spatio-temporal features modelled by the keyframe spatio-temporal graph-attention model better represent the behaviours performed in continuous skeleton sequences,with a 1.4% performance improvement.(2)To solve the problem of model inflexibility and model insensitivity to perception of changes in the temporal domain caused by the method involving many manual design parts in the skeleton behavior detection task,this thesis proposes a fine-coarse feature sampling Transformer model based on spatio-temporal feature extraction by the spatiotemporal map attention model,and the sampled feature set is obtained by the fine-coarse feature sampling module to enhance the model’s perception of salient motion features The encoder and decoder in the model are responsible for modeling the correlation between features and between features and behavioral instance queries,avoiding the use of handdesigned Anchor and making the model more flexible.The experimental results on the PKU-MMD dataset demonstrate that the FCTR detection performance is improved compared with the current mainstream algorithms.(3)This thesis designs and implements the Web-based intelligent monitoring system for gas stations based on theoretical research,which can achieve all-weather supervision of defined areas in gas stations and real-time monitoring of abnormal behaviors such as smoking and phone calls to reduce the possibility of accidental events. |