| Safety is the premise and foundation of all work.Recognition and understanding of human behavior through video surveillance has always been the key research content in the field of intelligent security management and control.Although the researches on single action recognition have already gained good achievements,there are few studies on complex human behavior recognition.Being limited by complex algorithms,low efficiency,and poor universality,current complex human behavior recognition methods cannot meet the diversified needs of security control system and hinder the intelligent operation of security management and control system.Aiming at solving the problems in the field of intelligent management and control,such as high application cost of video surveillance system,inability to meet personalized human behavior monitoring requirements,and difficulty in integrating with other information systems,this paper designs and implements a complex human behavior recognition video surveillance system based on semantic definition.The final goal of this paper is to design a complex human behavior recognition method with zero training sample.It conducts research works on 3 aspects,which are human behavior representation and data augmentation,temporal action recognition,and semantic feature learnings of complex human behavior.During the researches,the paper put forward a 3D matrix representation for human behavior which is based on dense storage method,a series of human behavior data augmentation methods based on human physiological structure and human motion characteristics,and a large-scale-first temporal detection strategy for temporal action localization.Moreover,based on the definition that “any complex human behavior is composed by basic human actions”,the complex human behavior recognition problem is transformed into human action recognition and similarity judgement of action combinations.Then,a universal complex human behavior recognition model is designed and constructed.The model can learn the features of complex human behavior from semantic level,and work effectively in zero training sample situation.The main works are listed as follows:1.To solve the problems of motion information loss caused by the motion trajectory spatiotemporal overlap,and over-fitting caused by insufficient training samples,the research works on multi-feature fusion behavior representation and data augmentation strategies based on human skeleton sequence are carried out.A universal representation method for transforming human action video to motion images or motion matrices is proposed.By using this method,human motion information can be densely stored,and a set of data augmentation strategies are put forward.First,the geometric feature of human skeleton and the original motion information of each skeleton joint,such as speed,intensity and orientation are calculated by relative coordinates and traced frame by frame.Then,the above motion features are discretely stored in a 3D matrix which is composed by time,joint and feature.By using this data structure,the motion trajectory spatiotemporal overlap problem can be effectively solved.Moreover,with the help of the data structure,a set of credible action sample augmentation methods based on human physiological characteristics and motion rule are designed to mimic a person with different height conducting a same action as its original sample does with different speed and orientation.2.To solve the problems of flexible duration of the human actions,irregular transition among the actions,and imprecisely localization of an action in a complex human behavior video,the research works on the multi-scale learning models and algorithms for temporal action localization are carried out.A novel temporal action localization method called the large-scale-first temporal detection(LSF-TD)strategy is proposed.First,based on interval interpolation and abandon algorithms,a human action spatiotemporal scale unifying strategy is put forward to reserve the motion information as much as possible.Then,a universal method to update a traditional convolutional neural network to a multiscale network by using a global average pooling layer and a spatial pyramid pooling layer is proposed.The upgraded convolutional neural network can learn deep features from the different scales of action samples.It not only improves the precision of the model,but also provide a flexible temporal window choice for it.Finally,the thought of a famous algorithm for Chinese text segmentation named as the maximum matching algorithm(MMSeg)is adopted and improved for temporal action localization.The new algorithm improves the model’s capacity of detecting micro-motions and the video traversal efficiency.3.To solve the problem of low universality and poor semantics comprehension of the complex human behavior recognition model caused by its nonenumerative characteristic,the research works on complex human behavior representation and semantic feature learning are carried out.A universal complex human behavior recognition model based on semantic feature learning is proposed.It put forward a new method to quantify the complex human behaviors which are various in classes and durations into a unified scale of 3D matrix.Thus,the deep neuro network can learn semantic features from the descriptions of the complex human behaviors and realize the zero-shot purpose for complex human behavior.Through above research works,this paper addresses several key issues in the fields of human action detection,temporal action localization and complex human behavior recognition,and puts forward a series of new ideas,models,and methods which are technically feasible and universal.The related research achievements solve several engineering application pain points of human behavior monitoring and early warning in the field of security management and control,reduce the cost of application and promotion of relevant technologies,and improve the level of intelligence of security control system. |