| The long tail visual recognition task has always been a hot research topic in the field of computer vision.The addition of the ”long tail” factor makes the problem more universal.Any naturally collected dataset has more or less the imbalance of long tail distribution and is often ignored,because the ”long tail” problem is often avoided by manual rebalancing of dataset category samples.The research direction facing the ”long tail” problem has produced some classical schemes,which are generally divided into the following categories: alleviating the re-sampling of the difference in the number of samples in the dataset category,reducing the re-weighting of the difference in the weight of the head and tail categories,and transferring the knowledge of the head to the learning of the tail category.Some of the existing work in the field of long-tail visual recognition has the following problems:,Or why some easily confused samples are often misclassified by the model due to non-category factors such as background.(2)CRT,LWS and other recent methods to obtain SOTA effect on traditional inter-class long tail learning tasks have not changed the feature learning method,but only imposed a rebalancing operation on linear classifiers.This often only adjusts the decision boundary of the classifier to be far away from the tail class and closer to the head class.Although the overall effect of the model in the test set has been improved,the performance of the model in the head class has been reduced.This trade-off is covered up by the balanced test set of the class samples.It is unreasonable to damage the head class to improve the overall performance.Because although the goal of the long tail visual recognition task is to learn a class balanced model,the head class sample is still more often encountered in the actual reasoning process.At the same time,RIDE,BBN and other models that have noticed the shift of noncategory factors and changed feature learning have to some extent alleviated the aforementioned trade-off between head and tail category performance.In view of the above discussion,this thesis presents a modeling method of long-tail visual recognition task based on causal inference,explicitly modeling the non-category factors that affect the performance of the model,and divides the long-tail offset problem into two parts and gives their corresponding strategies.Combining the accuracy rate of traditional long-tail learning evaluation index Top-1 with causal inference,the accuracy rate of evaluation index Top-1 is given to evaluate the performance of the model more accurately and comprehensively.This thesis systematically analyzes the problems existing in some existing work of visual recognition tasks in the long-tail scene,and gives a model MRNet that integrates multiple balancing strategies,while weakening the negative impact of inter-class imbalance and intra-class imbalance.The experiment part uses the long-tailed data sets such as Image Net-LT,CIFAR-LT,and gives the verification of the prior conclusion in the model modeling,and then makes a comprehensive experimental verification of the model performance under two evaluation indicators and two test methods.Results The modeling method given in this thesis explains the inconsistent performance within the same class,and the MRNet model has overcome the performance trade-off between the head and tail classes to a certain extent,with good results.At the same time,it also provides a new way for the further exploration of this topic. |