Font Size: a A A

Research And Application Of Clothing Landmark Detection Based On Human Pose Estimation

Posted on:2021-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:C X LiFull Text:PDF
GTID:2481306308478934Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
Human pose estimation is one of the directions in computer vision.By locating the skeletal joints of the human body,it can be used for action recognition,human-computer interaction and autonomous driving.Because there are always multiple people in real cases,the common solution is to perform single-person pose estimation after human boxes were determined.Visual fashion is an emerging field in recent years.Due to the differences in people's understanding of clothing and the impact of individual subjectivity,the communication barriers between consumers and businesses have been created.Using artificial intelligence to make machines "understand" fashion and establish one unified clothing professional knowledge standard to improve the shopping experience for customers.This thesis completes the data preprocessing,dataset partitioning,training label generation and data augment,and then propose the solutions applied to multi/single clothing landmark detecting task,which contains the following two aspects:(1)Based on the architecture of Mask-RCNN,this thesis proposes a multi-task learning network,used for clothing detection and landmark positioning in complex scenarios.The network contains three parts of improvement:extracting features by means of DenseNet,using feature pyramid network to fuse feature,concatenating P2-P5 multi-scale features to enrich the context information for the subsequent network;The mask branch is composed of fully convolutional network,in the decoding part,fusing the last three layers of the convolution by deconvolution to retain more detailed information,it's helpful to improve positioning ability;Using Gaussian heat map instead of key point Mask as training label,this's useful to increase the supervision information for each point.After experimental verification,compared to the Mask-RCNN baseline model,this network achieved 16.4%and 28.4%relative improvement in IOU and NE scores respectively,the IOU score reached 90.2%,the NE score achieved 4.21%,the speed is about 10 pictures per second.(2)Combined with the single person pose estimation,this thesis proposed a network for single clothing landmark detection.This model contains three modules:The feature extraction module stacks global network(H1)and hourglass network(H2)to improve the feature extraction capability.H1 uses transfer learning to initialize the encoder,H2 applies dilated convolution instead of pooling to effectively integrate multi-scale features;The refined network module ensembles the features of each scale in H1 and H2 decoder,and applies deconvolution to complete up-sampling learning and concatenates the features in each channel;The intermediate supervision module calculates the loss at each stage.The grouped L2 Loss is proposed to solve the sparse problem caused by few valid points in the heat map,which calculates the positive and negative loss separately to reduce interference between categories.Soft OHEM sets the weights for key points in the loss function to distinguish between simple points and difficult points,which pays more attention to correct difficult points.Using cascade pyramid network as the baseline model,the three modules have achieved 13.53%,10%and 7.24%relative improvement in NE respectively,the NE loss has been reduced to 3.84%,the prediction speed is 17.7 pictures per second,considering the speed and accuracy.Through the prediction data augment can reduce the NE error to 3.63%.Through the study of computer vision related algorithms,this thesis proposes some innovative improvements to solve the clothing landmark detection in the field of visual fashion.These outcomes will be of great referring significance to the future practical applications and expansion of related fields.
Keywords/Search Tags:pose estimation, multi-task learning, visual fashion, landmark detection, encoder and decoder
PDF Full Text Request
Related items