Font Size: a A A

Long-tailed Data Classification Based On Multi-granularity Feature Optimization

Posted on:2024-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:W ZhaoFull Text:PDF
GTID:2568307064455714Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid arrival of the big data era,machine learning has become a key tool for solving various practical problems.In traditional classification tasks,training data distribution is often artificially balanced,and the number of samples in different classes does not differ significantly.A balanced dataset simplifies the robustness requirements of the algorithm and guarantees the reliability of the generative model to some extent.However,as the number of categories of interest increases,maintaining a balance between categories brings an exponential increase in acquisition costs.A fundamental assumption for balanced datasets is that the number of samples corresponding to each class is approximately uniformly distributed.However,real-world data are often extremely imbalanced.The distribution of real-world data usually shows a long-tailed distribution,where the frequency of common but rare classes(head classes)dominates,and the frequency of rare but numerous classes(tail classes)is negligible.However,current classification methods for long-tailed data have some limitations.The specific performance is the following three aspects:(1)changing the original data distribution.(2)Most long-tailed data learning methods based on transfer learning use knowledge transfer from head features to tail features,which assume that classes are independent and ignore the multigranularity relationship between classes.(3)The learning method based on the transfer from head features to tail features in the long-tailed distribution will ignore the difference between head and tail features when the difference between head and tail features is particularly large.We proposed a long-tailed data classification method based on multi-granularity feature optimization by fully mining the raw data and exploiting multi-granularity relationships between categories to address these limitations.The main contents include:(1)Long-tailed data classification method based on attention mechanism feature enhancement.In long-tailed data classification,re-sampling methods change the distribution of the raw data and reduce the representative power of the learned features to some extent,which affects the tail feature space.Therefore,the feature enhancement method of the attention mechanism is used to enhance the tail features to improve the classification ability.It is a feature enhancement at a single granularity.Unlike traditional long-tailed data classification methods,the proposed method does not change the distribution of the original data and fully mines the discriminative feature information of the long-tailed classes without compromising the representation power of the learned features.(2)Long-tailed data classification method based on multi-granularity feature fusion,aiming at the problem of invalid transfer caused by the difference of head and tail knowledge in transfer learning method in long-tailed data classification.In addition,the transfer method ignores the multi-granularity relationship between classes.Thus,a multi-granularity knowledge graph is formed by considering the class space that has multi-granularity relations.This graph is recast into coarse-grained and fine-grained losses to guide the transfer.Moreover,by the proposed multi-scale feature fusion network,the rich information of the features is fully mined to drive the transfer,thus eliminating the invalid transfer.
Keywords/Search Tags:Long-tailed distribution learning, Feature optimization, Transfer learning, Attention mechanism, Multi-granularity knowledge graphs
PDF Full Text Request
Related items