| While existing action recognition tasks have yielded significant achievements,traditional action recognition methods often struggle to attain exceptional performance in certain specialized fields,largely due to the unavailability of right scale datasets.Therefore,how to train an effective model with a small number of samples has become a problem of practical value.The goal of the few-shot action recognition task is to solve this problem and provide a solution for accurate action recognition under the condition of limited labeled samples.Currently,the few-shot action recognition task mainly adopts the mode based on metric learning,in which the classification of unknown actions is determined by assessing the similarity between known action categories and the unknown action.In this framework,the majority of existing methods place greater emphasis on constructing more efficient and accurate feature extractors and matchers,often neglecting the challenges posed by limited samples and single-modality information.Additionally,due to the short development time of this task,most researchers have focused on improving the performance of the models on publicly available datasets,ignoring their application in specific task scenarios.Aiming at the problem of limits samples and single-modality information in the few-shot action recognition task,this paper proposes a few-shot action recognition method based on multi-modal fusion.Based on the common RGB modalities,this method further introduces multiple modalities(depth modalities,skeleton modalities,optical flow modalities,and temporal gradient modalities)as supplementary information.Meanwhile,inspired by the feature pyramid model,we find that multimodal information can be fused in a multi-layer complementary manner,thus proposing a shift-combined multi-modal fusion module.We verified the proposed method on the public dataset.The experimental results show that on the HMDB51 dataset,our method improves the accuracy by 1.3% compared to the optimal method;at the same time,on the UCF101 dataset,The accuracy rate also increased by 1.3%.Aiming at the problem that the existing few-shot action recognition work does not pay enough attention to specific task scenarios,we investigate the application of fewshot action recognition techniques in the context of dance action recognition tasks.In consideration of the rapid update of dance actions and the high cost of data collection,this paper applies the few-shot action recognition techniques to the field of dance action recognition,to solve the recognition difficulties caused by the limited number of samples.Since the field of dance action recognition is currently in its infancy and there is a lack of high-quality dance action datasets,we have constructed a fine-grained dance action dataset for Chinese Classical Dance Body Charm.The dataset was recorded by15 professional dancers surrounded by 6 cameras,including 3171 dance performances and 19026 dance videos,covering 50 fine-grained dance actions classified by professional dance experts.The size of the dataset is 2.3 times the size of the current largest dance dataset AIST++ and 6.7 times the size of the You Tube-Dance3 D dataset.We conducted experiments on the few-shot action recognition method on this data set,and the results show that the few-shot action recognition model performs well on the dance data set and can be well adapted to the dance action recognition task. |