Font Size: a A A

Activity Recognition Based On The Extraction And Analysis Of Human Pose Sequences

Posted on:2013-10-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:1228330401960217Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Estimating human poses from still images and recognizing human actions from videosare the challenging and widely studied tasks in computer vision and artificial intelligentcommunity. They have a large variant of similar applications such as human-computerinteraction, image retrieval, surveillance, sports video analysis and so on. These two problemsare both concerned with analyzing the feature of the human in the images and suffer theinterference not only from the wide variant of appearance due to skin color, clothes, but alsofrom conditional factors such as background cluster, illusion difference and shadow. Thesetwo problems are associated and we combine the two problems together in this paper: weapply the video analysis technology to improve the human pose estimation performance fromthe videos and employ the poses sequence extracted to describe human actions from multipleviews. The main research problems and contributions are as follows:1. A new algorithm is proposed to segment foreground from videos based on theanalysis of the spatial-temporal information in the video. Firstly, by regarding the changeprocess of a single pixel as discrete-time signal, the video is segmented into foreground andbackground in a glancing way by applying the Gabor Filter to the temporal domain. Secondly,global color model and local color model are defined and built by clustering the colorinformation of the background and foreground. Finally, a double-labeling method based onthe pixel level and region level is employed for fine segmentation of the foreground.2. We propose the description method of human poses based on the pose-sentences.Believing explicitly applying the feature of joints into pose estimation can improve theperformance, we model limbs and joints as components of human body. As the poses of partsindicate the presence of full-body poses, we split the pose space of each body part into severalclasses called pose-words and apply the combinations of pose-words named pose-sentences todescribe full-body poses. In order to robustly describe the image information, we propose anaive image descriptor called Local Pattern of Oriented Gradient which applies the LBPoperator to the Histogram of orientation. This descriptor is sensitive to the variant of thedistribution of orientation in neighborhood cells and can capture the gradient orientationdifference in different human parts. 3. We apply the effective message-passing belief propagation method to search the posewith max posteriori in the image. As the dimension of the pose feature vectors are usually tensof thousands, we project the pose feature vectors into a low embedding space to discard thenoise and redundant information. The incremental Latent Support Vector Machine is appliedto train the model in the embedding space. We find the dimension reduction process cansignificantly save the memory space and the train time in the training process while onlyslightly influence the performance. We also investigate into the performances of using severallinear or non-linear dimensional reduction methods to the pose features and find theOrthogonal Linear Graph Embedding outperformances other methods most of the time.4. We propose a method for recognizing human actions based on the human posesequences from videos. Given a video, limb masks are extracted by clustering image featuresand motion features in human region and limb masks are helpful to reduce the interferencefrom background and partially address the “double-counting” problem during pose estimation;Then, the extracted pose sequence is smoothed using the Kalman Smooth to remove the noiseand make the changes of pose consistent. Multiple kinds of feature sequences based on theposes information are extracted to describe human actions from different views. At last, weapply the mixture of HCRFs to recognize the human actions: We train one HCRF for eachkind of the feature sequences and combine the conditional probability from different featuresequences to improve the recognition accuracy.Experimensts on some public benchmark datasets have proven the efficience of ourproposed methods in the field of Forground Segementation, Pose Estimation and ActionRecognition.
Keywords/Search Tags:Spatial-Temporal Analysis, Local Pattern of Oriented Gradient, Pose Sentences, Pose Aarray Based Mixture Feature
PDF Full Text Request
Related items