| In recent years,algorithms in the direction of computer vision have increasingly turned their research attention from images to videos.A series of sub-tasks have been designed to enable algorithms to better understand video content,including those centered on human activities like action recognition,action localization,action prediction etc..With the advancement of deep learning,the accuracy of action recognition task has been further improved.However,action anticipation still remains a big problem,since anticipation is always full of uncertainty.How to model the action pattern in video in a better way remains to be explored.The annotation and analysis of the data in this research do not propose a new dataset,but to expand and re-thinking the existing dataset,in order to study the solvability of the anticipation task and the intrinsic problems in the dataset.In the sampled data,We concluded that the proportion of predictable samples accounted for about 70%.And the accuracy of the human predictions is basically the same as the baseline model,at about 40%.Then We have made a detailed comparison and case analysis between the human results and the model results,found out the possible structural patterns in the action sequences in the video.We delve deeper look to understand about the dataset and the task.Secondly,we analyzed the metrics of the task and related experimental phenomena in detail,pointed out the limitations of the recall metric.Finally,considering the uncertainty in anticipation,we fine-tuned the model with the manually labeled anticipation results as training data,and improved top5 recall by about 2 percentages without substantial drop on top5 accuracy.With smaller models and shorter inputs,our method achieves the second best results on the validation set.This work made a fundamental and thorough analysis of anticipation tasks and Epic-Kitchens100.It paves the way for better analysis of the experimental results,so as to enlighten the anticipation task and even the recognition task,and to hasten the amendment and iteration of dataset and task setting. |