Font Size: a A A

Research On Classification And Recognition Of Visual Targets Based On Machine Learning

Posted on:2020-08-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z K WengFull Text:PDF
GTID:1368330605972830Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the continuous development of computer technology as well as the deepening of people’s public safety awareness,intelligent image and video analysis as a common security measure is attracting widespread attention.Some classification and recognition methods at home and abroad have been extensively read and researched.There are some limits in machine learning based visual object classification and recognition methods at present.Some researches have been made in depth including visual object classification,moving object representation and recognition,and video activity prediction.Some main contributions have been developed as follows:A non-destructive classification method is proposed for ancient ceramics identification.The curvature of the edge contour of ancient ceramics is obtained by using differential chain code.Some multi-channel color features of ancient ceramics glaze are extracted in HSI color space.The LBP features of decorations are extracted.The dating analysis of ancient ceramics in different periods was carried out based on the above visual features.The experimental results show that the developed method has better performance than single feature recognition for non-destructive classification.A spatio-temporal edge trajectory for moving target representation is proposed.All kinds of edge trajectories with similar space-time and motion characteristics are regarded as a group of skeletons to better describe the evolution process of action under different motion modes.A coding method based on magnitude and direction information is introduced to cluster these trajectories.The space-time motion skeleton descriptor is gotten to fully consider the motion similarity between spacetime trajectories.The experimental results show that the proposed method can be better applied to the representation and recognition of real video motion actions without constraints.A stacked trajectory energy image method is proposed to represent the long-term dependence of frames.The scale or spatial position of moving object is ignored for global convolution operation.A trajectory-aware hierarchical convolution strategy is proposed to decompose each frame into three levels including complete video region,foreground target region and fine motion region,so that video action can be represented from multiple levels.Multi-frame trajectories are mapped onto an image to construct the stackedtrajectory energy image,which can effectively describe the long-term motion characteristics of video.Experimential results show that the proposed method can be applied to retain and enrich the temporal and spatial information of moving targets,and improve the recognition perforamnce.A multi-motalities trajectory-aware CNN model is proposed for video action recognition.On the basis of two-stream convolution neural network,it is extended to three-stream network.The spatial CNN,temporal CNN and the global motion CNN extract the static,dynamic and global motion action characteristics,respectively.A video multimodal motion target behavior descriptor based on trajectory perception is formed by the aggregation connection of three modal convolution features.A linear support vector machine is employed to classify and identify the multimodal convolution behavior descriptor as mentioned above.The experimental results show that the proposed framework can effectively recognize human action in videos,and has higher recognition accuracy than single motality network.A dynamic enhancement method of foreground moving trajectory based on boundary priori is proposed.An undirected weighted network is constructed,and the geodesic distance is defined as the distance between two super-pixels.The cumulative weighted shortest path defines the saliency probability of the superpixel by calculating the geodesic distance to the boundary.At the same time,the dynamic contrast segmentation strategy is introduced to get the fine motion region and realize the robust sampling of moving target.Robust sampling of video motion target region is achieved.An activity prediction method based on attention-based temporal encoding network is proposed.The long-short term memory network is adopted.The output features are more focused on the semantic key frames,which can effectively model long-term sequences while restraining temporal redundancy.The experimental results show that the proposed framework can predict the action well in the early stage of video,and effectively improve the prediction performance.
Keywords/Search Tags:Machine learning, Visual object classification, Convolutional neural network, Motion representation, Activity prediction
PDF Full Text Request
Related items