In recent years,the Internet and e-commerce have been developed rapidly,and the amou nt of image data of clothing goods on the Internet has increased dramatically,so how to use t he massive clothing image resources to help the development of the clothing industry has bec ome a popular research direction.Due to the problems of subjectivity,large workload or onesided description of traditional image retrieval methods,garment image retrieval based on de ep learning has become a hot research topic in this field.At the present stage,most of the lear ning tags used in deep learning-based garment image retrieval are relatively single and canno t reflect the characteristics of garments and human body.Therefore,this study aims to extract and analyze the parameters related to trouser structure and human body features in trouser im ages,and achieve segmentation classification and feature extraction of structural features of t rouser images using an improved semantic segmentation model,so as to realize for trouser sil houette recognition and fast retrieval of trouser images.The main research contents and concl usions of this paper are as follows:(1)In this paper,firstly,a certain number of trouser images were collected,and a total of5291 trouser images meeting the measurement requirements were obtained by using data enh ancement processing.Then,the traditional annotation method was used to annotate all the sa mples in the trouser suit database.The Deep Lab V3+ network model was improved by setting all the backbone networks to Mobile Net V2,and the silhouette of trouser was recognized usi ng this improved network model.The recognition results show that the average cross-compar ison ratio of trouser image segmentation is 83.85%,and the cross-comparison ratios of segme ntation for H-contour and A-contour are less than the average value.(2)Due to the inaccuracy of traditional labeling methods,the four trouser silhouettes,A,H,V and O,are easily confused in labeling,which leads to the accuracy rate of H and A silh ouettes being lower than other silhouettes.Therefore,this paper proposes a labeling optimizat ion method based on the basic principles of garment ergonomics and garment structure.The t rouser silhouette parameters are defined using the width ratio of the hip,knee and trouser ope ning to the waist,and the angle formed by the waist,hip,knee and trouser opening parts with the trouser side seam respectively.The confusable samples in the dataset were measured usin g this set of parameters,and factor analysis was performed on the collected dataset.The resul ts showed that trouser silhouettes can be divided into: hip factor(F1)and thigh factor(F2),an d three of these parameters were selected as trouser silhouette identification parameters by co mbining the garment structure principles,and a new model for judging confusable trouser sil houettes was constructed based on these three parameters,and all confusable trouser silhouett es,were re-labeled.Using the improved Deep Lab V3+ semantic segmentation model,the cont our recognition was performed on the new trouser suit dataset.The results show that the seg mentation and classification effects of all types of silhouettes are improved to some extent aft er label optimization.Among them,the segmentation effect of trouser suit of H contour has i mproved significantly,and the two indexes of intersection ratio and accuracy have improved by 5% and 7% respectively,and the overall recognition effect of the model has been optimize d to some extent.(3)In order to construct a trouser image structure feature extraction network,the commo n structure attributes in trouser images are modularly analyzed.Combined with the silhouette attribute label optimization method,four major structure modules of trouser suit are summari zed: silhouette module,trouser waist module,hip module and pocket module.A trouser imag e dataset with 4 categories and 11 structural attributes was constructed by this method.To ena ble the network to output both image segmentation results and feature codes,a trouser structu re feature extraction network with a hash layer and a cosine similarity module is embedded in the base network Deep Lab V3+.This method learns hash codes and image representations in a pointwise manner,which is more suitable for retrieval of large-scale datasets.Two kinds of pant structure feature extraction networks are constructed according to the difference of the in put contents of the embedding layer,which are the feature extraction network with high sema ntic features and the fused feature extraction network.The cosine similarity between the featu re code and the binary orthogonal target is used as the loss function to make the model seman tic segmentation accurate while the binary code is also discriminative.By comparing the four different loss functions,it is found that the training efficiency of the model is higher when cr oss entropy loss plus Dice Loss is used as the loss function.(4)In order to make the feature extraction model output feature codes with high retrieval efficiency,two types of trouser structure feature extraction network structures are designed,which have high semantic information features and fusion features,respectively.When the tw o types of features are used as the input of the embedding layer,they constitute two kinds of pants structure feature extraction networks,which are high semantic feature extraction netwo rk and fusion feature extraction network,respectively.The feature codes output from the two models were used as the basis for retrieval,and the two models were evaluated by the averag e retrieval accuracy and the modular retrieval accuracy.The experimental results showed that when the number of retrievals was 5,the retrieval codes output by the fusion feature extracti on network were 5.8% higher in retrieval accuracy and 2.1% higher in module retrieval accur acy than those output by the high semantic feature extraction network.When the number of r etrievals is 10,the retrieval code output by the fusion feature extraction network is 36.1% hig her in retrieval accuracy and 7.2% higher in module retrieval accuracy than the retrieval code output by the high semantic feature extraction network.Therefore,the retrieval performance is superior when fused features are used as the input of the code. |