Based on the development of deep convolutional neural networks and the accumulation of data,deep learning-based object detection has become an important research branch in the field of autonomous driving and a core task in computer vision.Mainstream object detection algorithms have achieved significant improvements in accuracy on manually collected and selected datasets,but most of them are designed and validated under the assumption of relatively balanced data distribution.However,in realworld traffic scenarios,the collected data not only suffer from problems such as occlusion,varying sizes,and different lighting conditions for different objects but also exhibit a long-tail distribution of class samples,where the sample quantities of different classes are extremely imbalanced.This distribution imbalance leads to performance imbalance among different categories in the classification head,making it challenging for models to learn representations for all categories effectively.This dissertation focuses on addressing the long-tail problem in multi-object detection in traffic scenarios.First,to overcome the issue that data resampling and cost-sensitive learning methods struggle to balance the performance of classifiers,a class-balanced multi-level learning approach is proposed.Second,to tackle the problem of insufficient learning of category-specific features in long-tailed traffic datasets,a supervised contrastive learning method is introduced to enhance feature representations.The main research contents and contributions of this dissertation are as follows:1.A learning method based on feature grouping and category refinement is proposed.Training on imbalanced data,where classes with more samples suppress classes with fewer samples,resampling methods and cost-sensitive learning can effectively alleviate the suppression of tail classes by head classes.However,these methods often sacrifice the performance of head classes to improve the performance of tail classes,resulting in limited overall performance improvement of the model.To alleviate the direct competition between head and tail classes,we can divide the classes into mutually exclusive groups based on the number of samples for each class.Within each group,different classifiers are used for classification,achieving a balance in the number of samples for each class within the group and alleviating the suppression of tail classes by head classes.However,this grouping approach requires constructing multiple classifiers,and the classification features extracted by a single feature extractor may not be finegrained enough,which can also affect the performance of the classifiers.Therefore,additional fine-grained feature extraction is performed specifically for samples in different groups.Furthermore,the imbalance between groups caused by grouping classes based on sample quantities is addressed by jointly adjusting the multi-level classifiers.Experimental results demonstrate that compared to the baseline model,the multi-level refinement learning approach achieves a more balanced learning process and partially resolves the performance imbalance issue of the classifiers.2.A supervised contrast learning method oriented to limited positive example representation enhancement is proposed.Noting the existence of image features in longtailed distribution datasets that are difficult to learn and represent adequately,contrast learning is used to enhance the representational capability of the model.Unlike the traditional method,in which unsupervised contrast learning is used to pre-train on Image Net and then fine-tune the target dataset,this method uses a contrast learning mechanism based on the images of the target dataset in the training phase to constrain the maximization of mutual information among image features and improve the stability of the model.Specifically,images with different enhancement methods are used as inputs,and features are acquired using a feature extractor for inter-feature comparison.In order to better utilize the characteristics of datasets containing labels,this dissertation proposes a positive and negative sample selection method applicable to target detection that enables deeper features of the generated images to retain richer semantic features by correlating corresponding features.In addition,the number of positive examples in each category of samples for comparison also presents the problem of imbalance,so we propose a comparison learning method with limited positive examples to achieve balanced feature learning.The experimental results show that the image features generated by this method can close the intra-class distance and pull apart the inter-class distance in the coding space,which makes the model fully learn the features of each class and has a significant performance improvement.In summary,this dissertation proposes two methods for class imbalance in traffic multi-target detection to address the problem of difficult classifier performance balance and limited model characterization capability in existing methods.In this dissertation,two traffic multi-target detection datasets with long-tail distribution are used for experiments,and the results show that,compared with the benchmark model,the method in this dissertation has significant performance improvement on both datasets,which can well alleviate the category imbalance of multi-target detection in traffic scenarios. |