Font Size: a A A

Human Parsing In Complex Scenes Based On Deep Learning

Posted on:2022-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:M YanFull Text:PDF
GTID:2558307154976229Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Human parsing plays a crucial role in human-centric and potential down-stream applications,such as virtual try-on,virtual reality,person re-identification and so on.Although remarkable progress has been made by scientists on tasks of detection,instance segmentation and single-human paring,the performance in complex scenes,that is,multi-human parsing,are not satisfactory.In reality,multiple people usually appear in the same time,so it’s crucial to develop a more effective multi-human paring structure.In this thesis,a new framework based on deep learning is proposed to deal with multi-human parsing,and further improvements are made using dynamic convolution and attention mechanism.The main achievements are listed as follows.This thesis proposes A new structure(NTHP)of multi-human parsing which is end-to-end,anchor-free and based on deep learning,and it’s the first time to tackle this task utilizing the method of single-stage instance segmentation which directly predicts the classification and pixel-level segmentation results on each human and part at the same time.In the prediction of pixel-level segmentation,the structure predicts a set of prototypes shared among humans and parts and the coefficients of them utilizing the thought of prototype and a linear combination was made between them to obtain the segmentation results.The thesis proposes a post-processing operation to obtain the final parsing results by calculating the overlapping ratio between each human and part.This thesis makes a further improvement on the proposed structure by dynamic convolution and attention mechanism.When predicting the pixel-level segmentation results,the idea of dynamic convolution is used,which first predicts the convolution kernels and convolution features respectively,and then convolutes them together to obtain the final segmentation results.In addition,in the prediction of features of convolutions,a position-sensitive module based on attention mechanism is proposed to obtain the translation-variance and rich context information.Experimental results show that the multi-human parsing framework designed in this thesis has better performance than other advanced algorithms.
Keywords/Search Tags:Multi-human parsing, Deep learning, Instance segmentation, Convolutional neural network
PDF Full Text Request
Related items