Font Size: a A A

Research And Application Of 3D Human Skeleton Extraction And Motion Recognition For Multi-Kinect

Posted on:2021-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Z SongFull Text:PDF
GTID:2428330602481482Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As a basic problem in the field of human-centered computer vision activities,human motion recognition has a wide range of applications in the fields of virtual reality,augmented reality and film and television modeling.In labeled motion capture,because professional equipment is to be worn and the requirements on the background are relatively high,simple backgrounds with large color differences are generally required,and the cost and experimental complexity are relatively high.However,with the continuous development of deep data acquisition equipment such as Kinect,due to its low cost,high accuracy,and adaptability to complex backgrounds,its applications are becoming more and more widely used in human body modeling,scene reconstruction,and scene classification.The 3D human skeleton information extracted by the depth map obtained by the device is widely used in the fields of human motion recognition and human gait recognition for its simple and accurate representation of human motion Therefore,the research on action recognition of 3D human skeleton has important significance and application value.Although there are currently many researches on 3D human motion recognition for a single Kinect,they all face the same problem:human occlusion results in an incomplete human skeleton collected by a single Kinect;insufficient use of inter-frame information in a skeleton-based motion recognition algorithm results in The recognition accuracy is not high.When the entire human body is directly facing the front of the Kinect camera,the collected skeleton is the most accurate,but when the human body is not completely facing the camera,the 3D joint points of some human bodies will be missing or inaccurate,resulting in incomplete 3D skeleton information of the entire human body.Complete;when researching motion recognition algorithms based on 3D human skeletons,most algorithms often only consider the time dimension information and do not fully consider the inter-frame information and the effective use of intra-frame information,resulting in low accuracy of the final motion recognition.In response to the above problems,this article uses three Kinects to collect human skeletons,and proposes a weighted fusion algorithm for 3D skeleton generation,which effectively complements the 3D skeleton information to form a high-quality 3D human skeleton;based on temporal convolutional nerves Network TCN proposed a locally fused neural network model to complete high-precision recognition of 3D skeleton sequences,and achieved high-precision recognition and classification accuracy on the currently most challenging data set NTU-RGB+D.The main work and innovations of this article are as followsFirst,based on three Kinects,3D human skeleton data at different angles were obtained,and the reconstruction of the skeleton was completed by the weighted fusion algorithm proposed in this paper.A local feature fusion temporal convolution network for human motion recognition was proposed,and it was applied to NTU-RGB+D dataset.A higher classification accuracy is achieved.1.In order to rectify the inaccuracy of the corresponding skeleton data caused by the possible angular tilt of the Kinect camera,first according to the scene depth map data of each angle obtained by the three Kinects,and then use the scene depth map to obtain the corresponding point cloud information and point cloud information.Contains the coordinates of each point in the scene,which provides the raw data for obtaining the ground principal normal.Each point in the point cloud data and the two adjacent points form a patch,that is,three points form a patch,calculate the normal vectors of all patches connected to the current point,and then The average value of the normal vector of the patch is used as the normal vector of the current point;in the same way,the normal vectors of all points are calculated;then the normal vectors of all points are clustered to generate three main normal vectors;The vector with the smallest axis angle is used as the main normal(ground normal).Adjust the 3D skeleton data corresponding to the three Kinects according to the angle between the main normal and the ground This paper uses the weighted average fusion algorithm to complete the fusion of the three skeletons.2.In order to improve the accuracy of motion recognition based on 3D skeleton sequences,this paper fully combines the intra-frame and inter-frame joint point activity information and proposes a local feature fusion time convolution network algorithm for human motion recognition.Based on this algorithm,The human skeleton sequence performs global and local feature extraction,effectively uses spatial information,and finally sends the local information to the temporal convolutional neural network to complete the temporal dimension feature learning,thereby effectively learning the spatiotemporal features of the action and making the accuracy of action recognition Further improvement.Finally,experiments were performed on the currently most challenging 3D data set NTU-RGB+D.In the two experimental data set allocation methods,the experimental results show that compared with the prior art,the recognition algorithm in this paper is correct in classification The rates are better than other methods.In addition,three Kinect framework fusion and local feature fusion time convolution network algorithm for human motion recognition are applied to the framework visualization and recognition application project.
Keywords/Search Tags:Kinect, 3D Human Skeleton, TCN, Skeleton Fusion, Action Recognition
PDF Full Text Request
Related items