| Clothing image segmentation technology is a method to predict the clothing category label of each pixel in the input image.It plays an important role in clothing intelligent applications and is a key element in fashion trend prediction,clothing intelligent design,and 3D virtual dressing technology.Therefore,achieving high-precision segmentation of clothing images has become a popular research direction.Although deep learning-based semantic segmentation algorithms have higher accuracy compared to traditional image segmentation algorithms,most semantic segmentation algorithms cannot be directly applied to clothing image segmentation due to the variability of image shots,the similarity of clothing categories,and the complexity of boundaries,resulting in rough boundaries and confused categories.To solve these issues above,this paper proposes an advanced Res Net50-based semantic segmentation model whose primary structure is the encoder-decoder to achieve high-precision segmentation of clothing images.Clothing sizes vary greatly in different shots,which places high demands on the adaptability of the model.Therefore,this paper proposes a mixed spatial pyramid pooling module: MSPP.MSPP uses pooling and dilated convolution to obtain clothing feature information with different scales of receptive fields,and designs an excitation branch that can extract clothing category information.It combines the excitation branch with the pooling branch and the dilated branch to generate dense contextual information with category information,thereby improving the model’s adaptability to clothing of different sizes and accurately identifying clothing categories.During the process of convolution and pooling,MSPP inevitably loses detailed information which includes clothing boundary information,resulting in a decrease in the model’s accuracy in segmenting clothing boundaries.To address this issue,this paper proposes an auxiliary global feature extraction branch: AGE,which uses a large convolution kernel with a huge receptive field to extract overall clothing features,compensating for the loss of detailed information by MSPP,improving the model’s ability to accurately segment clothing boundaries,and alleviating the problem of pixel confusion in clothing regions.Models with encoder-decoder structure first reduce the resolutions of the feature map stage by stage and then restore them stage by stage,resulting in an imbalance between spatial information containing clothing shape features and channel information containing category features in different stages.Therefore,this paper proposes a spatial information enhancement module,SIE,and a channel information enhancement module,CIE.The former helps the model restore the shape features of clothing by extracting spatial information in double-level,while the latter enhances clothing category recognition accuracy by squeezing and exciting channel information.SIE and CIE promote the flow of information in different stages of the model,helping to balance the imbalance between shape and category information.The model was finally trained and tested on the Deepfashion2 dataset.Comparative experiments show that compared with state-of-the-art models such as CCNet,Deep Labv3+,and PSPNet,the model proposed in this paper achieves the highest m Io U and Boundary Io U,which are 74.55% and 57.51% respectively.Finally,to test the feasibility and practicality of the model proposed in practical applications,the model was applied to clothing 3D reconstruction.First,the segmentation model was used to remove the background of the real clothing photos.Then,the SIFT algorithm was used to detect and match the feature points,the SFM algorithm was used to perform sparse reconstruction,the PVMS algorithm was used to perform dense reconstruction,and finally,the Poisson surface reconstruction was used to generate the 3D model of the clothing.Compared with photos with background,using photos processed by the proposed model for 3D reconstruction results in more realistic results. |