Font Size: a A A

A Fine-grained Classification Algorithm Based On Deep Learning

Posted on:2022-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:W H ZhangFull Text:PDF
GTID:2518306476998689Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
The rapid advancement of digital technology has made drastic changes.People's demands for a better material life are constantly on the rise.And Artificial intelligence(AI)is given greater expectations for the ability of classification.In the classification of different species,lots of achievements have been made.But the fine classification of the same species has made slow progress.Therefore,computer scholars and experts began to focus on the study of fine-grained visual categorization.The difference between fine-grained visual categorization and ordinary image classification is that the gap between traditional image classes is larger and the difficulty of classification is lower,while the gap between fine-grained visual categorization classes is small and the gap within class is large,which is reflected in the small difference between lots of objects and the large difference between the same objects.Therefore,fine-grained classification is more difficult,which requires that the classification itself has the ability to locate details,so that it can identify objects through details.Aiming at the difficulties of fine-grained visual categorization,a series of studies are carried out in this paper.And the existing domestic and foreign methods are analyzed.The global context information can improve the network's understanding of the image,and enlarge the receptive field to better obtain the detail information.So,the network's understanding ability of the image is further deepened.However,the current network for the overall understanding of the image is not detailed enough.Therefore,it is decided to adopt the improved weakly supervised data augmentation network(WS-DAN)structure,which focuses on improving the ability of global context modeling,the receptive field and the ability of network to obtain distinctive features.The main work is as follows:(1)In this paper,the existing domestic and foreign methods are analyzed,the disadvantages and advantages of each method are analyzed,and the algorithm of this paper is gotten through comparison.In this study,a fine-grained visual categorization algorithm based on modified WS-DAN is adopted.(2)The global context block(GC block)is introduced to enhance detail location.Fine-grained visual categorization based on WS-DAN has the disadvantage of inaccurate object location,which is caused by the weak dependence between pixels.Therefore,this paper uses Non-local block to enhance the ability of context modeling.In the meantime,the introduction of GC block for long dependency modeling alleviates the problem that Non-local block modeling at the pixel level results in a large amount of computation.(3)SE block is applicated to enhance the extraction of attention graph.The feature map extracted from the backbone network contains parts that are not needed for fine-grained visual categorization.Only the attention map obtained by channel down sampling increases the training cost.SE Block is applicated to detect the importance of the channel.It depends on the importance of the channel to enhance the useful feature map and weaken the feature map which is useless for network classification training.It helps the network to enhance the ability to obtain distinctive features,and then get the attention map which is more helpful to detail positioning and improve the classification accuracy.(4)The ASPP structure is used to increase the diversity of feature map and extract multi-scale information.In order to extract multi-scale information,the image pyramid pooling module was used previously.Although different features could be obtained,the extraction was very slow,so the pooling pyramid with the introduction of dilated convolution was used.The structure uses multiple parallel void convolution with different rates to extract features,which is fast and can extract multi-scale information.After upon improvement measures,the deep neural network has a certain improvement in four public data sets.The accuracy of CUB-200-2011,FGVC-Aircraft,Stanford Cars and Stanford Dog in the top-1experiment have improved by 0.51%,0.47%,0.62% and 0.36% respectively compared with WS-DAN,reaching 89.91%,93.47%,95.12% and 92.56%,which validates the useful of the algorithm.What's more,the study of this paper also has some significance for human daily life,such as the study of CUB-200-2011 classification identification can realize that ordinary people can also identify some birds that do not know in our lives,which is meaningful for the guard of bird species.
Keywords/Search Tags:FGVC, weakly supervised learning, data augmentation network, Global Context, Channel attention, dilated convolution, SPP
PDF Full Text Request
Related items