| In recent years,with the widespread application of intelligent monitoring systems and the continuous development of computer vision technology,group behavior recognition has become a key technology in the field of video surveillance,which has important research significance in the fields of public safety,human-computer interaction,and video analysis.The rapid development of society has put forward urgent requirements for effective prevention of large-scale group events,how to construct an efficient and stable group behavior recognition algorithm is the key to solve this problem.However,in the process of constructing group behavior recognition algorithm,there are still several core issues that need to be addressed urgently.First,improving network operation speed is the core issue to ensure the real-time performance of intelligent video surveillance systems.Second,interaction modeling is the core problem that affect the accuracy of group behavior recognition.Third,the comprehensive use of multi-cue features is the core issue for comprehensively improving group behavior recognition performance.This paper focuses on the above three issues and makes the following contributions:(1)Slow network operation speed is the core problem that restricts the real-time performance of intelligent monitoring system.This paper proposes an efficient C3 D network(Efficient 3D Convolutional,EC3D)model,which improves the speed of the network during the spatio-temporal feature extraction phase.Aiming at the problems of C3 D networks have a large number of parameters and slow training,from the perspective of reducing the amount of parameters,this paper solves the space-timeconvolution of 7*7*3 in C3 D network into 7*7*1 space convolution and 1*1*3 time convolution.The parameters of each convolution kernel changed from 7*7*3=147,to7*7*1+1*1*3=52,and the parameter amount was reduced by 64.6%.By comparing the running speed of EC3 D and C3 D experimentally,it is found that EC3 D can process more data in the same time,which is about 5 times that of C3 D,which proves that the improvement of C3 D in this paper is effective.This ensures that EC3 D can process more data when extracting spatio-temporal features.(2)Interaction modeling is the core problem that affect the accuracy of group behavior recognition,this paper proposes a group behavior recognition network model based on EC3 D and interaction modeling.By constructing a group undirected graph,the interaction between people within the group is exploited.First,use EC3 D to extract the spatio-temporal features of a single person in each group of video frames.Then,based on the spatio-temporal characteristics and location information of a single person,an undirected graph model of interactions between group members is constructed,where the vertices of the undirected graph are members of the group,and the connection between the vertices indicates the interaction between the two parties,and the thickness of the connection indicates the strength of the relationship.Then,a graph convolution network(GCN)is used to dynamically maintain the undirected graph of the interaction relationship,and the interaction relationship characteristics are obtained to achieve group behavior classification.It is proved through experiments that the recognition results based on the characteristics of the interaction relationship can effectively make up for the shortcomings in the method without considering the interaction relationship,thereby improving the accuracy of group behavior recognition.(3)The comprehensive utilization of multiple clue features is the core issue to improve the accuracy of group behavior recognition.In this paper,the overall network architecture is designed as a hierarchical model,and group behavior recognition is based on different characteristics.The first layer network performs pre-identification of group behavior based on the above-mentioned interaction relationship characteristics,another layer of network builds another group behavior pre-recognition channel based on the EC3 D network to extract the global scene spatio-temporal features.Aiming at the performance changes of the two-layer network Softmax classifiers during the recognition process,this paper uses a multi-classifier weightadaptive decision fusion algorithm to weight the two results and make the final decision.The basic idea is : based on the K-nearest neighbor classification algorithm criterion,the effective neighborhood of the test sample is determined via calculating the cluster similarity between the test sample and the training sample,and according to the classification accuracy of different classifiers in the effective neighborhood,weights are assigned to different classifiers,then the weighted fusion is performed on the output of different classifiers,realize group behavior recognition in complex scenarios.In order to verify the effectiveness of the algorithm in this paper,a large number of experiments have been performed on two public group behavior recognition datasets: CAD(Collective Activity Dataset)and CAE(Collective Activity Extended Dataset),the average recognition accuracy reached 91.4% and 97.9% respectively,compared with current popular recognition methods,this paper shows better performance.Therefore,the effectiveness and feasibility of the EC3 D and interaction modeling method proposed in this paper in the group behavior recognition process are proved. |