Font Size: a A A

Research On Fine-grained Recognition Based On Deep Learnin

Posted on:2024-06-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y WangFull Text:PDF
GTID:2568307106982149Subject:Electronic information
Abstract/Summary:PDF Full Text Request
Fine grained recognition based on deep learning aims to recognize and classify different subcategories of objects from a meta category.It has great application value in daily life,business,security,and nature conservation,especially in rare animal and plant identification showing great ecological value.For example,there are seven known species of flycatchers in nature.The posture,background and size of the same species of flycatchers vary greatly in the captured images,while different species of flycatchers have the same posture and similar appearance.Among them,the more common ones are the Acadian flycatchers and the Great Crested flycatchers,which account for more images of these two species,while the images of the other five species are relatively few,and the data set has a long-tailed distribution.In addition,the number of samples in the training set is often limited due to the expensive manual annotation cost.In summary,the four main challenges of fine-grained recognition are "large intra-class variation","small inter-class variation",limited training data and long-tailed data distribution.It can be seen that fine-grained recognition is more difficult than general-purpose image recognition.The deep learning technology represented by Convolutional Neural Network(CNN)can effectively cope with the above challenges,so it is urgent and necessary to carry out research on fine grained recognition based on deep learning.The fundamental task of fine-grained recognition research is improving recognition accuracy by extracting good feature representation of sample images.The most commonly used methods are based on discriminative region localization and higher-order feature coding.The discriminative region-based localization methods can extract local features,but the localization methods are often simple and brutal,which can introduce noise and cause interference to the training results.The current higher-order feature coding methods tend to only second-order encode the global features and neglect to second-order encode the features of the localized discriminative regions.In addition,the second-order features are high-dimensional and complicated to compute.Therefore,in this paper,we propose two models for discriminative region localization and higher-order feature encoding methods to remedy these shortcomings.(1)To address the problems of "large intra-class variation","small inter-class variation" and the small proportion of objects to be recognized,this paper proposes a discriminative region localization model based on multiplexed attention mechanism and reverse recognition.The discriminative features of the signature image are enhanced by adding a multiplexed attention mechanism module between the discriminative and inverse streams,and the learning of discriminative features is increased to alleviate the problem of "small inter-class variation".By training the network to make the same verification decision for signature images with inverted foreground and background,the network is motivated to focus on the stroke pixels to extract effective features and reduce the interference of large background areas on feature extraction.We propose a multiplexed recognition loss function,which does not require the features of similar samples to be infinitely close to each other,and fits the characteristic of "large intraclass variation".This method can effectively improve the accuracy of discriminative region localization model recognition.(2)To address the problem of "small inter-class variation",limited training data and longtailed distribution,a second-order feature extraction model based on data augmentation and feature coding is proposed.The weight matrix of the fully connected layer is used to rank each channel of the feature map,and the channels with higher importance are selected to accurately locate the discriminative regions and achieve accurate data augmentation.Second-order coding is performed on both the original sample image and the augmented sample image,so that the difference in second-order features can be used to achieve recognition even if the first-order features are the same.In the second-order feature encoding process,instead of using a bilinear CNN for two feature extractions,only a single CNN is required for one feature extraction,which can effectively shorten the training time.The model achieves accurate data augmentation and second-order feature coding simultaneously,and has strong recognition capability.
Keywords/Search Tags:Fine grained recognition, feature coding, data augmentation, multipath loss function, attention mechanism
PDF Full Text Request
Related items