| Video is the mainstream media communication form in today’s society.Automatic recognition of behaviors performed by groups composed of multiple individuals in videos has broad application prospects in intelligent monitoring,intelligent traffic,sports event analysis and other fields.However,the complexity of video content and individual relationships poses great challenges to the task of identifying group behavior.In this thesis,deep learning methods such as graph convolutional network,self-attention module and motion feature extraction module are used to address the drawbacks of existing research methods in the field of group behavior recognition.The specific research content is divided into the following three parts:Aiming at the problem of redundant interference caused by similar individual action information to the group behavior recognition,this thesis proposes a sub-group fusion method using multi-scale graph convolution network.The operation of abstracting multiple individuals into a small number of sub-groups step by step is beneficial to reduce redundant information among multiple similar individuals.Moreover,the fusion of multiple sub-group features at the same scale before and after the graph convolution processing is beneficial to enrich the semantic information of sub-group features.This method achieves a group behavior recognition accuracy of 91.62%in The Volleyball Dataset,which is 0.22 percentage points higher than the baseline.In view of the limited receptive field of graph convolution network,long-distance individual relation information may be lost,and self-attention mechanism is not good at extracting individual relation features of graph structure.The thesis designs a network structure which combines the graph convolution module with the self-attention module.The local individual relationship features extracted by the graph convolution module and the remote individual node feature dependencies captured by the self-attention module are fully integrated in different ways to enrich the expression of group features.In the public dataset of The Collective Activity Dataset and The Collective Activity Extended Dataset,this method achieves a group behavior recognition accuracy of 91.58%and 93.03%,which is 0.58 and 0.16 percentage points higher than the baseline,respectively.Traditional methods suffer from the limitation that they construct individual relation graphs only using constrain dependencies of spatial features among individuals,but neglect the association of motion information among individuals which changes over time.Aiming at this defect,the thesis proposes an algorithm to construct the improved individual relation graph based on motion features.Firstly,a motion feature extraction module is designed,which is suitable for extracting the motion features of multiple individuals in a group.Then,the construction of the individual relation graph is improved by calculating the relation values of the motion features among individuals.It could reflect more real interactions among individuals in the group.Finally,the motion feature extraction module is embedded into the network of the individual relation graph to complete the recognition of the group behavior.The experimental results show that the group behavior recognition accuracy of the improved individual relational graph network based on the motion features is 92.16%,94.37%and 91.62%on The Collective Activity Dataset,The Collective Activity Extended Dataset and The Volleyball Dataset,respectively.Compared with the baseline,the improvement is 1.16,1.5 and 0.22 percentage points,respectively. |