Font Size: a A A

Task-Adaptive Compression Method For BERT Via Truncation Before Fine-Tuning

Posted on:2023-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:J FanFull Text:PDF
GTID:2568307169980899Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
As the large-scale pre-trained language models are proposed and developed,the accuracies of a great number of assignments are improved.Currently,a series of contextdependent pre-trained language models represented by BERT have become an essential component when solving natural language processing tasks.However,in order to encode as much as linguistic knowledge,the pre-trained language models are designed complexly.And it deserves much more computational power,memory and so on.Consequently,their size can lead to sometimes prohibitively slow fine-tuning and inference.This prohibits them from being applied in resource-limited,on-device or streaming data scenarios.To alleviate this,various model compression methods have been proposed,but most of them consider solely reducing inference time,often ignoring significant increases in training time.Also,the expenses in the fine-tuning period are not token into consideration.Some issues still remain to be addressed.For the purpose of saving resources during the fine-tuning and inference time,we propose to adaptively truncate BERT language model before fine-tuning for text classification tasks,and a criterion is designed based on separability theory for indicating where to truncate according to the hidden states.Particularly,our main contributations are as follows:Firstly,we provide a detailed overview about the current existing model compression work.Specifically,we classify them into four categories,namely knowledge distillation,model pruning,quantization and matrix decomposition,then introduce them thoroughly.Secondly,we propose a model compression method that is truncating BERT model,and verify the feasibility about truncating before fine-tuning.Existing model compression methods only focus on the inference period,while ignore the costs of fine-tuning stage.In fact,the expenses of fine-tuning are far more than those of inference period,and should be attached importance to.To solve these above mentioned problems,we propose to compress models by mean of layer truncation before fine-tuning,and save the costs both during the fine-tuning and inference stages.Thirdly,we design a criterion to decide where to truncate before fine-tuning for binary text classification tasks.Current existing model compression methods are too complex to apply and do not make full use of the hidden states.In this paper,we take the intermediate features into consideration,and put forward a criterion to indicate where to truncate before fine-tuning.Fourthly,we transfer the truncation method from binary text classification assignments to multi-class text classification.Based on the relation between them,we propose a criterion to indicate where to truncate before fine-tuning for multi-class text classification tasks.In summary,we study the model compression method before fine-tuning targeting at 11 Chinese and English text classification tasks,and concentrating on the BERT pretrained language model.Our work is valuable and meaningful for the development of model compression of text classification and other text analysis tasks.
Keywords/Search Tags:Natural Language Processing, Text-Level Classification, Model Compression, Pre-trained Language Model, BERT
PDF Full Text Request
Related items