| With the development of the Internet and the popularity of smartphones,the way people shop has changed a lot.From offline traditional physical stores to Internet e-commerce,people’s consumption patterns have fully moved towards online shopping,and simple image feature classification cannot meet people’s needs.Although there has been a lot of work using text information in product images,the way to combine the two features is relatively simple.The stitching operation of text features and image features cannot produce information interaction,which only provides different single features to the model at the same time.In addition,commodity images are a fine-grained data set,and different manufacturers,different models,and even different models of products under the same batch,these images with a low degree of differentiation will affect the classification effect of the model.In view of the above background,and according to the characteristics of product image data,the main research work of this paper is as follows:(1)We propose a product image classification method based on multi-feature fusion.Simple image features cannot meet the needs of classification.This paper will extract rich text content from images,extract the context information through language models and obtain the text feature at the same time.The image feature extraction capability of convolutional neural network is used to extract the deep image features of commodity images,and then explore the relationship between the two features to classify them.We compare the multimodal fusion method with other fusion methods and separate classification of the two features,which verifies the effectiveness of multimodal fusion product image classification and improves the classification accuracy.(2)We propose a fine-grained classification method for commodity images based on the text area from coarse to fine.By extracting and reclustering SIFT features of all detected text areas in commodity images,the key areas of text in the product category are found,and the key areas of text are first coarsely classified,and then the commodity images are subdivided,so as to realize the fine-grained classification of commodity images based on the text area from coarse to fine.This method utilizes key areas of text in commodity images and outperforms the results of classification directly using SIFT features.(3)We design and develop a commodity query system based on We Chat mini program,and the back-end uses Python and Flask frameworks for development.For offline retail,the merchandise tracking system helps streamline the product classification process and merchant management process. |