Font Size: a A A

Research On Knowledge Distillation In Convolutional Neural Network Model

Posted on:2021-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y S FengFull Text:PDF
GTID:2427330614468282Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the development of information technology,deep learning has achieved excellent results in many fields,and convolutional neural networks have already made indelible contributions to many computer vision tasks.However,the efficient performance of convolutional neural networks is at the cost of resource consumption.There are problems such as excessive parameters,excessive calculations,excessive energy consumption,and long running time cannot guarantee applications on resource-constrained plaforms such as mobile terminals and embedded devices.Therefore,the research on model compression for convolutional neural networks is of great significance.Knowledge distillation is a promising method in current model compression methods.In theory,a pre-trained large model is called a teacher model,and a small model to be trained is called a student model.Under the guidance from teacher model,more structured information between training data can be obtained,so knowledge distillation can improve the performance of small models.This paper proposes two new knowledge distillation algorithms from the perspective of the definition of knowledge and the differences between teachers and students:1.A knowledge distillation algorithm based on triplet distillation,which is mainly optimized for face recognition tasks based on the widely used Triplet loss.This method first explored the omnipresent phenomenon that "two people look more like" in the original method,so the concept of face similarity was proposed.And define it as a kind of teacher's knowledge,and then map it to the appropriate range as a dynamic additive margin to pass to the student model for training.The method has been validated on multiple validation sets.2.Distillation algorithm based on multi-student distillation.The method first analyzes the impact of the differences between the teacher model and the student model in knowledge distillation through theoretical and experimental aspects.Due to the gap between the capacity of students and teachers,students can only learn part of the knowledge of teachers.Therefore,this paper takes advantage of this difference between teachers and students and proposes a framework for simultaneous training of multiple students.On the one hand,this article enables students to communicate with each other and learn from each other;on the other hand,this article appropriately expands the diversity among students and increases the amount of effective information.At the same time,in view of the possible hardware and time resource consumption caused by the multi-student synchronous training framework,a multi-branch synchronous training framework is proposed,which has made significant progress in training consumption time and hardware resource consumption.Both frameworks have achieved better classification performance than previous methods on multiple validation datesets.
Keywords/Search Tags:Convolutional Neural Network, Model Compression, Knowledge Distillation, Face Recognition, Multiple Models
PDF Full Text Request
Related items