| Face recognition has become one of the most widely used biometric technologies.Benefiting from the development of deep learning technology,face recognition accuracy has been dramatically improved and even surpassed humans in some indicators.An important reason for such significant progress relies on large datasets.However,a challenging problem that comes with large datasets is the noise problem.Since datasets are often collected by crawlers,some noise is unavoidable.At the same time,the size of labeled data is still insignificant compared to the amount of unlabeled data on the web.If these unlabeled data can be utilized,it can reduce the cost of datasets collection,improve model accuracy and enhance model generalization performance.Therefore,the training problem of unlabeled data is also an extremely valuable research area.Aiming at the problem that large face datasets contain a large amount of noise,this thesis proposes an anti-noise face recognition training algorithm.This thesis carefully studies the noise characteristics and analyzes the distance between the sample and the class center of different classes during the training process to identify whether each sample is open-set noise or closed-set noise in real-time.Different noises are treated differently to take advantage of them.A corresponding loss function is designed for open-set noise since it still carries information that is beneficial to improving the model’s performance.The method does not require prior knowledge about noise and multiple times of training,and can be combined with any classification-based loss function to achieve better results.A lot of experiments show the effectiveness of this method.The model obtained by this method has a stable improvement and the accuracy of noise judgment and correction is very high.This method also gets a certain improvement when training on clean dataset.Aiming at the problem that a large amount of unlabeled data cannot be used and that many unlabeled data are shallow data,this thesis designed a loss function to utilize unlabeled data.The method first carefully studies the distribution characteristics of shallow data in the feature space,so as to obtain the improvement direction of the loss function.By embedding shallow unlabeled data into a weight matrix and imposing penalty boundaries on these data,this method can squeeze the feature space of labeled data and improve the model accuracy.At the same time,according to the characteristic that unlabeled data is shallow data,a dynamic larger penalty boundary is imposed on these unlabeled data to ensure a better feature space.Experiments show that this method can stably improve the performance of models trained on labeled datasets while reducing the work of data labeling.To sum up,this thesis firstly proposes an anti-noise algorithm for the noise problem in labeled data to prevent the problem of model accuracy degradation.After solving the noise problem,an algorithm is further proposed to utilize unlabeled data to improve the performance of face model. |