Font Size: a A A

Research On Few-shot Image Classification And Recognition Based On Deep Metric Learning

Posted on:2024-05-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:1528307307453524Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
The field of artificial intelligence has experienced rapid development in recent years.Deep learning methods,represented by their excellent modeling and information processing abilities,have attracted widespread attention,and significant achievements have been made in fields such as natural language processing and computer vision,which cannot be separated from a large amount of data support and the improvement of hardware computing power.With the continuous progress of theoretical methods,reducing algorithm computing consumption and manual annotation costs have also become a key concern for researchers.Especially in scenarios where data is scarce,researchers hope to draw on human learning processes to improve the model,so that the model can infer the general rules of things from a small number of samples and have good generalization ability.In this context,researches on the Few-shot learning have emerged,which has gradually become an important technical mean to reduce data dependence,and an important direction for achieving artificial general intelligence.Although the Few-shot learning has made some progress,it is still limited by learning efficiency and flexibility,mainly due to two reasons: model design and prior knowledge utilization.The first reason is that the setting of the Few-shot learning in the visual field is a typical metric learning problem,and its challenge lies in the open set data setting,which needs to adapt to the random combination of categories when only a few samples are provided for each category.Therefore,the functionality of the measurement module and the design of the feature extraction module are particularly important,and they need to work together to achieve sufficient generalization ability,which is different from visual tasks under previous conditional settings.The second reason is that human recognition ability is built on a certain prior knowledge foundation,humans effectively summarize and transfer past experiences,which can achieve rapid recognition and generalization ability.However,current Few-shot learning methods lack targeted learning strategies for pre-trained knowledge.In response to these issues,this paper analyzed and proposed different solutions,mainly focusing on three aspects of research content,the first two contents focused on the designing improvement of the feature embedding modules and measurement modules in metric learning,and the last content explored how to use prior knowledge in pre-trained models to promote fine-tuning of downstream tasks.The main work and contributions of this paper are summarized as follows:Firstly,in order to cope with the classification boundary drift caused by the random combination of categories in the Few-shot learning,this paper proposed an improved dynamically scaled softmax loss.This loss is based on the perception of difficult samples and distance sensitivity,and generates adaptive scaling coefficients based on the similarity relationship between query samples and different categories,allowing for flexible adjustment of sample penalty levels to overcome the ill posed problem caused by classification boundary drift.On this basis,this paper further distinguished difficult and easy samples based on the prediction situation,and focused on weighted learning of difficult samples.Unlike previous difficult and easy sample mining methods,this paper has more clearly increased the gap between the weights of correctly predicted and incorrectly predicted samples.Under the guidance of this loss function,the model has a clearer division of feature spaces for different categories,and corresponding experiments have also proven that this loss helps to obtain more generalized features with stronger representation ability.Subsequently,this paper further extended the idea of dynamic scaling to the process of feature embedding extraction,by utilizing its excellent characteristics of amplifying differences and maintaining similarity,higher numerical distributions are clustered into a single category.On this basis,this paper proposed mutually exclusive attention for the Few-shot contrastive learning scenarios.Unlike other attention mechanisms,this paper only focuses on the foreground semantic enhancement of one category.At the same time,in order to better coordinate with attention generation,this paper specifically designed a prototype calibration module to improve the generation and description of category prototype features.Through the close cooperation of these two parts,the expression ability of features is enhanced and performance is improved.Finally,this paper proposed a Few-shot learning method based on the pre-trained feature fine-tuning of the large models.The filter of the pre-trained large models has redundancy in feature representation,in this regard,we have designed a corresponding filtering mechanism to obtain specific feature representations related to the category.Specifically,feature selection is achieved by selecting filters related to the task,and this feature selection strategy is used as the learned meta information to generalize to the downstream category recognition tasks.Among them,feature selection is achieved by a projection matrix composed of a set of orthogonal bases,the projection direction can be determined by minimizing intra class similarity and maximizing inter class similarity,and then the pre-trained features are projected onto a subspace stretched by orthogonal bases to enhance discriminability.At the same time,in order to further improve and constrain the learning of projection matrices,this paper used the class prototype as a dictionary,based on obtaining sparse representations of corresponding category query samples,approximate constraints are used to enhance semantic consistency between features of the same category.Feature selection and sparse representation jointly promote the fine-tuning model for downstream Few-shot learning tasks,demonstrating good category applicability.
Keywords/Search Tags:Few-shot Learning, Metric Learning, Attention Mechanism, Fine-tuning of the Pre-training Features
PDF Full Text Request
Related items