| 3D Hand mesh recovery aims at recovering the pose and shape information of the whole hand from a given image.Since this task can be widely used in virtual/augmented reality,robot grasping,smart home technology and other fields,thus it has become one of the hot spots in the computer vision research field in recent years.However,considering the high degree of freedom,self-occlusion and self-similarity of hands,some methods rely on ground-truth 3D hand labels for fully/strong supervised training,which limits their practicability due to the difficulty of obtaining 3D annotations.Meanwhile,the existing methods tend to pay more attention to the performance on a certain dataset rather than finding a general mapping rule,so the performance of their model usually reduces significantly when transferring the model to a new scenario for testing,and fine-tuning the network parameters on the new scene will lead to catastrophic forgetting,which limits the application scenarios of the network.In order to solve the above problems,inspired by human lifelong learning ability,this paper introduces the concepts of "association" and "accumulation" into hand-related tasks,and proposes a 3D hand mesh recovery method based on self-supervised continual learning.Firstly,this paper learns rich hand prior information through fully supervised pre-training,and then transfers the trained model to a new scene for self-supervised learning and compute constraints between depth maps/hand masks rendered by the estimated hand model and their corresponding ground-truth labels.This "association" process between the old and new scenes can significantly reduce the dependence on 3D hand labels.Secondly,in order to make the network avoid focusing too much on the data distribution of the new scenes and catastrophically forgetting the previously learned knowledge during pre-training or on the old scenes,this paper uses continual learning method to make the model incrementally "accumulate" gestures and background information of a variety of scenarios,search for a general image-to-hand mapping rule,and ultimately makes the model to be better applied to complex real-world scenarios.In addition,in order to alleviate the shape distortion of the predicted mesh without the supervision of ground-truth hand mesh,a hand shape reality refinement network is proposed in this paper,which post-processing the above hand mesh estimations to improve its shape reality.In this paper,a large number of experiments are carried out on three hand datasets.The experiments not only confirm the rationality of the network designs for cross-dataset learning in this paper,but also analyze the effects of various constraints and hyper-parameters through detailed ablation studies.In addition,the rationality and superiority of the self-supervised continual learning method are proved by comparing with state-of-the art methods on different datasets.Finally,this paper advances towards practical application.In this paper,a real hand dataset is made,and the model trained in this paper is tested directly on this dataset,which verifies the practicability of the proposed method. |