Font Size: a A A

Learning Temporal Structure Of Videos For Action Recognition Using Pattern Theory

Posted on:2021-03-19Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:2428330614960352Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Action recognition is to automatically recognize human action in video by processing and analyzing video data according to the frame sequences of video and using computer vision processing technology.It is a hot topic in the field of video analysis and has been widely used in intelligent video surveillance,video retrieval and human-computer interaction.Although existing action recognition models can well analyze and recognize the action in videos,they still have some limitations: end-to-end action recognition models are difficult to explain the specific process of action;the video contains a lot of background redundancy and noise information due to the complex background,the change of motion speed,the change of view and lighting conditions,which seriously interferes with the extraction of useful information.In this regard,this thesis focuses on the problems existing above,analyzes the basic features of action recognition tasks,and performs further research by combining the multi-scale features and the basic ideas of pattern theory.Specific research content includes: starting from the feature descriptions of the different time scales,we study on combining multi-time scale deep features for action recognition;starting from the evolution process of the action on the time axis,we study on the modeling of atomic action sequences for action recognition.The temporal structure of the complete action is mined through the theory of pattern theory to realize the action recognition.The main work of this thesis is as follows:(1)Aiming at the problem of different actions on time axis have different change rate on time axis,this thesis proposes multiple time scales CNN-LSTM model for action recognition.The video is divided into different time scales.We use residual 3D network build two stream feature expression of video segments,and model for the whole video sequence with Long Short-Term Memory network.Finally,we fusion the feature expression of each time scale for action recognition,which can effectively improve the recognition performance.(2)Aiming at the defect that most current deep learning methods can explain the action occurrence process from the cognitive perspective,this thesis uses the k-Means method to construct the middle-level semantic expression of atomic actions,and model the action occurrence process as a series of atomic action sequences to mine the temporal structure of videos,which make the action recognition interpretable.(3)In view of the problem that the video in the current real scene contains a lot of noise and background redundancy,this thesis proposes the key generator proposal operation to remove low-confidence atomic action units in the video and selects important foreground information,and proposes the interpretable operation to measure the connection strength between each atomic action.We maximize the temporal structure under the probability framework,which can further suppressing noise and obtain optimal video interpretation.Finally,we implement action recognition through sequence matching methods.
Keywords/Search Tags:Activity recognition, Multiple time scales, Temporal structure, Pattern theory
PDF Full Text Request
Related items