| In recent years,the researches on human action recognition have achieved remarkable improvements,and it has been widely used in various industries,e.g.,video surveillance,automatic driving,etc.Among these works,human action recognition based on deep learning has developed rapidly.If the labeled data is sufficient enough,supervised learning can achieve a satisfactory recognition performance.However,the diversity of motion types and the complexity of the video background make action labeling costs a lot of labor,which would seriously limit the application of supervised human action recognition methods in practical scenarios.Different from the traditional learning approaches,in recent years,action recognition based on Few-shot and Zero-shot learning have achieved great attention from researchers.In this paper,we mainly study and implement Few-shot action recognition and Zero-shot action recognition.(1)Few-shot action recognition research focuses on learning the correlations between different kinds of actions.Traditional RGB-based Feature could possibly be interfered by complex background information and thus reduce the discriminativeness.To address this defect,this paper proposed a Few-shot action recognition model DFSAR based on a two-stream architecture.Our model can effectively attenuate the influence of complex background information on feature discrimination by introducing optical flow features Our main contributions are listed as follows: We introduce the optical flow feature into Few-shotaction recognition and we fuse optical flow feature and RGB based spatiotemporal feature so that the correlation between distinct actions can be effectively captured;We propose a novel end-to-end trainable two-stream architecture(DFSAR)to realize few-shot action recognition;Experimental results on HMDB51 and UCF101 datasets show that the model has some improvement in Few-shot action recognition Comparing with existing methods.(2)Our research on zero-shot action recognition is mainly focus on how to learn the correlation between actions and the labels.The state-of-the-art method is to establish a visual-semantic joint embedding space based on labeled data and also to recognize invisible actions.However,most of the existing methods directly build the visual space based on the pre-trained spatiotemporal feature extraction model,so that the salient factors reflecting the action-label correlation in the video sequence are overwhelmed.To address this issue,this paper proposes an attention-based zero-shot action recognition model ADZSAR.The main contributions and novelties are as follows: A new feature extraction method based on attention mechanism is proposed,which can automatically and self-adaptively highlight the salient factors for the feature of different kinds of actions,by which a non-redundant visual space can be constructed;Introduce the state-of-the-art semantic embedding model to effectively extract the semantic embedding information of class labels;Experiments on datasets HMDB51 and UCF101 show that this method performs best among existing spatiotemporal feature-based zero-shot action recognition methods.(3)Based on the above researches,we design and implements a few-shot and zero-shot action recognition system.The system is built by Pytorch deep learning framework.By deploying the pre-trained model on the server,few-shot and zero-shot action recognitions are realized.Our system mainly includes data preprocessing module,model processing module and result display module.Our system runs stably and achieve a remarkable classification effect,which demonstrate the practicability and effectiveness of the few-shot and zero-shot action recognition models proposed in this paper in practical scenarios. |