Font Size: a A A

Research On Data-Efficient Learning For Vision Tasks

Posted on:2024-03-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:T YuFull Text:PDF
GTID:1528306932457774Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development and popularization of deep learning technology,deep learning-based models have achieved significant breakthroughs in various fields.The key to the success of deep learning technology lies in training models with massive amounts of data.However,in real-world scenarios,collecting sufficient data is often challenging due to cost,security,privacy constraints,etc.Therefore,it is important to achieve data-efficient learning,i.e.,training high-performance models with limited data.In the field of computer vision,extensive research has been conducted on dataefficient learning.However,most methods tend to overlook the intrinsic characteristics of visual signals and tasks during the modeling process,leading to suboptimal adaptation of models and algorithms tailored to vision tasks.Meanwhile,few studies have systematically investigated the problem of data-efficient learning for vision tasks.To tackle these challenges,this dissertation proposes five methods from two complementary perspectives:internal data knowledge mining and external data knowledge transfer.The internal perspective includes self-supervised learning,data usage optimization,and data augmentation,which aim to extract as much information as possible from the existing data.In contrast,the external perspective focuses on transfer learning and virtual data generation,leveraging knowledge from related domains or using extra models to generate virtual data.The specific contents are organized as follows:(1)To address the challenge of mining effective contextual information in selfsupervised learning(SSL),this dissertation proposes an SSL method based on maskbased latent reconstruction modeling.Unlike mainstream SSL methods,such as masked image modeling that reconstructs missing content in the original space,our method focuses on reconstructing missing content information in a compact latent space.This approach effectively avoids the reconstruction of redundancies and distractions,and promotes the model’s ability to mine the most effective contextual information for the task.Additionally,the proposed SSL objective is introduced as an auxiliary task to the current task,enabling better task adaptability.Experimental results on multiple data efficiency benchmarks for vision-based reinforcement learning verify the effectiveness of the proposed method in improving data efficiency.(2)To address the issue of insufficient consideration of task characteristics in data usage optimization,this dissertation proposes a task-customized data optimization method.Specifically,using image inpainting with prominent task characteristics as an example,this dissertation demonstrates how to design effective feature normalization(FN)based on task characteristics.First,the dissertation theoretically shows the existence of feature mean and variance shift issues in the existing FN methods used in image inpainting models,and then designs a region-based FN method.This method takes into account the spatial distribution of features,dividing features into multiple regions,and normalizing each region separately.This effectively avoids the aforementioned mean and variance shift issues.Finally,experimental results on multiple image datasets show that the proposed method effectively improves the data efficiency and final performance of the model.(3)To address the lack of exploration of image local diversity in data augmentation,this dissertation proposes a patch-wise automatic data augmentation method.This method effectively avoids the issues found in previous image-wise automatic data augmentation methods,such as underutilization of image local diversity and loss of critical image semantics.The proposed method achieves this through fine-grained control,i.e.,at the patch level.Specifically,the method divides an image into multiple patches and utilizes a multi-agent reinforcement learning algorithm to automatically determine the optimal augmentation operation for each patch.Finally,experimental results on various vision tasks,including regular image classification,fine-grained image classification,and object detection,demonstrate the effectiveness of the proposed method in improving data efficiency.(4)To address the problem of inefficient transfer of prior knowledge in pre-trained models during transfer learning,this dissertation proposes an efficient transfer learning method based on task residuals.This method effectively avoids the issues found in previous transfer learning methods,such as damaging prior knowledge and insufficient learning of task-specific knowledge.Specifically,when using a pre-trained model for transfer learning on the current task,the method introduces a set of learnable parameters.These parameters are independent of the pre-trained features and are directly added to the pre-trained features or classifier for tuning.Experimental results on multiple diverse vision tasks demonstrate that the proposed method effectively improves the data efficiency of the model.(5)To address the issue of unreliable virtual data generated by external generative models,this dissertation proposes a cycle-consistency constraint designed according to the characteristics of the task environment.Taking vision-based reinforcement learning tasks as an example,the dissertation trains dynamics models and samples virtual actions to generate virtual trajectories through these dynamics models.This process implicitly transfers the knowledge of the external environment to the virtual data.Next,the cycle-consistency constraint is adopted to ensure that the generated trajectories conform to the physical rules of the environment.These reliable virtual data are utilized to enhance the visual representation learning of the target task model,thereby improving data efficiency.Experimental results on multiple data efficiency benchmarks of visionbased reinforcement learning show that the proposed method effectively improves the data efficiency of the model.
Keywords/Search Tags:Vision Task, Data Efficiency, Deep Learning, Knowledge Mining, Knowledge Transfer
PDF Full Text Request
Related items