| With the boom in digital creativity,there has been an explosion in the amount of image data generated.In order to effectively manage the massive amount of image data,multi-level and multi-category management of images has become necessary.Traditional manual classification criteria for images are resource-intensive,so it is of great practical importance to study automatic classification.Existing deep learning image hierarchical classification models do not capture global features well at the underlying layer,while Vision Transformer(Vi T)is a novel neural network that captures global contextual information through an attention mechanism,allowing it to extract more comprehensive feature information and perform well in a variety of visual recognition tasks.However,previous work on Vi T has not exploited the hierarchical structure information embedded in images,making it difficult to apply the model to multi-layered multi-classification tasks of images.At the same time,realworld datasets often suffer from long-tail problems,which can make the model less effective in classification tasks based on unbalanced datasets.To address the above challenges,the main research of this thesis is to implement multilevel multiclassification of images based on Vi T,aiming to build a base model for achieving efficient management of multilevel multiclassification of images.Therefore,this thesis proposes a novel Vi T model that can effectively exploit the hierarchical information of images and is capable of accomplishing image multi-level multi-classification tasks.Also,this thesis proposes an image multilevel multiclassification algorithm based on unbalanced data,which alleviates the problem that the model performs poorly on unbalanced data sets.In summary,the main research contents and innovations of this thesis are as follows:(1)A Vi T model HFFVT(Hierarchical Feature Fusion Vision Transformer)based on hierarchical feature fusion is proposed.To address the problem that previous Vit models are difficult to accomplish multi-level image classification,this study uses the proposed HFFVT model to fill the gap of Vi T models for multi-level image multi-classification tasks.The model extracts the features of different layers of an image according to the layer labels of the image and fuses the features of different layers by the proposed layer feature fusion module to enhance the classification effect of the model.Finally,the model is compared and analyzed with a variety of advanced deep learning models on four datasets.The results show that the model in this thesis outperforms other comparative models.(2)An image multi-level multi-classification algorithm based on unbalanced data is proposed to alleviate the problem of poor performance of the model for multi-level multiclassification on unbalanced datasets.In this method,the HFFVT model is first used as the backbone network for multilevel multiclassification.Next,a Diverse Rand Augment strategy was performed on the unbalanced dataset for data expansion to increase the number of images in categories with a low total number of images.Then,the hierarchical cross-entropy loss function is replaced with a re-weighted hierarchical loss function.Finally,the HFFVT model is fully experimented with on the current mainstream deep neural networks to verify the effectiveness of the proposed algorithm.The results show that the proposed algorithm has a large improvement in the recognition rate of the tail categories when dealing with unbalanced datasets. |