Font Size: a A A

Research On Knowledge Distillation Algorithms Based On Output Responses In Convolutional Neural Networks

Posted on:2024-06-20Degree:MasterType:Thesis
Country:ChinaCandidate:B T ShaoFull Text:PDF
GTID:2568307127453974Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Convolutional neural networks(CNNs)play a huge role in computer vision and show superior performance on many tasks.In recent years,researchers have designed more and more powerful network architectures to continuously improve the representation ability of CNNs,but most of the network architectures are difficult to deploy on devices with weak computing power due to their high complexity.Therefore,model compression techniques have attracted much attention in recent years.Knowledge distillation,as one of the model compression methods,is based on transfer learning methods.It transfers the knowledge of a large network to a small network for deployment,achieving the purpose of model compression.The large network is usually called the teacher network and the small network is called the student network.Knowledge based on output responses,as one of the types of knowledge transferred in knowledge distillation methods,refers to the numerical values,probability distributions and other information contained in the direct output of the teacher network,and this information can be used as supervision information for task-driven training of the student network.And knowledge distillation based on output responses,as a research direction of knowledge distillation,has attracted much attention from practitioners due to its strong theoretical and easy-to-use characteristics,and has made great progress in both offline knowledge distillation and online knowledge distillation fields.However,because knowledge based on output responses is a feature with statistical properties,there are difficulties such as supervision information being too abstract and insufficient stability when distilling student networks,which leads to the problem of limited performance of knowledge distillation based on output responses being difficult to solve.The paper focuses on further research on the problem of limited performance of output response-based knowledge distillation in CNNs,and the main achievements are as follows.1)A multi-granularity knowledge distillation mechanism is proposed to address the performance limitation issue caused by network capacity gaps in output response-based offline knowledge distillation.The mechanism transfers the multi-granularity knowledge of the teacher network to facilitate the learning and understanding of the student network.To enable the teacher network to construct multi-granularity knowledge,a granularity self-analysis module is designed to transfer the native knowledge of the teacher network to the abstracted knowledge encoder and detailed knowledge encoder.To transfer the multi-granularity knowledge constructed by the teacher network,a multi-granularity knowledge distillation module is proposed,along with granularity-wise distillation scheme and stable excitation distillation scheme.Distillation experiments are conducted on multiple open-source datasets with multiple pairs of teacher-student networks.On CIFAR-100 dataset,the proposed mechanism achieves an average increase of 0.35% in accuracy compared to the advanced method,and on Market-1501 dataset,it achieves 94.50% Rank-1 and 84.30% m AP.The results demonstrate the effectiveness of multi-granularity knowledge distillation.The transferability of multigranularity knowledge is proven through visualization and quantitative analysis in the experimental analysis.The experiments also show that the multi-granularity knowledge distillation mechanism can improve the student’s transferability and robustness to noisy inputs.2)A method based on reflective learning is proposed to address the issue of limited performance in output response-based online knowledge distillation due to the lack of stable supervision information.In this method,a notebook module is designed to record and smooth the information output from the network ensemble,providing stable supervision information for the networks during training.Besides,the records in the notebook module can be judged by the sample labels to determine whether they are correct,allowing the network to fully utilize the information obtained from incorrect records.In addition,to maintain the proportion of the negative part of the reflective loss as much as possible during training,an error reflective coefficient scheduling strategy is designed.Experiments are conducted on open-source datasets to verify the effectiveness of the proposed method,which achieved performance improvements at a very low training cost.Specifically,compared with the advanced methods,the proposed method achieved average accuracy improvements of 0.63%,1.91% on CIFAR-100 and Tiny Image Net datasets,respectively.During testing,no extra computational overhead is incurred because the notebook module is not required.The experimental analysis explores the effects of different parameter settings on the proposed method,indicating its stability and the effectiveness of the designed error reflective coefficient scheduling strategy.3)A decoupled knowledge ensemble learning method is proposed aiming at the performance limitation issue of output response-based online knowledge distillation caused by high homogenization of networks.In this method,decoupled knowledge generated by a temporal mean teacher network is proposed to solve the problem of network training collapse.An initialization strategy for the teacher network is also designed to construct decoupled knowledge in the early stage of distillation.In addition,a decaying ensemble strategy is designed to improve the robustness of the early-stage supervision and reduce the problem of offset of the distribution of supervision in the later stage of training.A 2D geometric analysis diagram and a Monte Carlo simulation-based experiment are designed to demonstrate the motivation and principle of the proposed method,respectively.Multiple network structures are distilled on open-source datasets,and the proposed method is demonstrated to be advanced and effective.It achieves accuracy improvements of 0.33%,0.37%,0.07% compared to advanced methods on the CIFAR-10,CIFAR-100,and Tiny Image Net datasets,respectively.Ablation experiments show the effectiveness of the proposed method and verify its stability and parameter sensitivity.
Keywords/Search Tags:convolutional neural network, output response, knowledge distillation, multi-granularity knowledge, reflective learning, decoupled knowledge, ensemble learning
PDF Full Text Request
Related items