Task-Adaptive Compression Method For BERT Via Truncation Before Fine-Tuning

Posted on:2023-08-25

Degree:Master

Type:Thesis

Country:China

Candidate:J Fan

Full Text:PDF

GTID:2568307169980899

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

As the large-scale pre-trained language models are proposed and developed,the accuracies of a great number of assignments are improved.Currently,a series of contextdependent pre-trained language models represented by BERT have become an essential component when solving natural language processing tasks.However,in order to encode as much as linguistic knowledge,the pre-trained language models are designed complexly.And it deserves much more computational power,memory and so on.Consequently,their size can lead to sometimes prohibitively slow fine-tuning and inference.This prohibits them from being applied in resource-limited,on-device or streaming data scenarios.To alleviate this,various model compression methods have been proposed,but most of them consider solely reducing inference time,often ignoring significant increases in training time.Also,the expenses in the fine-tuning period are not token into consideration.Some issues still remain to be addressed.For the purpose of saving resources during the fine-tuning and inference time,we propose to adaptively truncate BERT language model before fine-tuning for text classification tasks,and a criterion is designed based on separability theory for indicating where to truncate according to the hidden states.Particularly,our main contributations are as follows:Firstly,we provide a detailed overview about the current existing model compression work.Specifically,we classify them into four categories,namely knowledge distillation,model pruning,quantization and matrix decomposition,then introduce them thoroughly.Secondly,we propose a model compression method that is truncating BERT model,and verify the feasibility about truncating before fine-tuning.Existing model compression methods only focus on the inference period,while ignore the costs of fine-tuning stage.In fact,the expenses of fine-tuning are far more than those of inference period,and should be attached importance to.To solve these above mentioned problems,we propose to compress models by mean of layer truncation before fine-tuning,and save the costs both during the fine-tuning and inference stages.Thirdly,we design a criterion to decide where to truncate before fine-tuning for binary text classification tasks.Current existing model compression methods are too complex to apply and do not make full use of the hidden states.In this paper,we take the intermediate features into consideration,and put forward a criterion to indicate where to truncate before fine-tuning.Fourthly,we transfer the truncation method from binary text classification assignments to multi-class text classification.Based on the relation between them,we propose a criterion to indicate where to truncate before fine-tuning for multi-class text classification tasks.In summary,we study the model compression method before fine-tuning targeting at 11 Chinese and English text classification tasks,and concentrating on the BERT pretrained language model.Our work is valuable and meaningful for the development of model compression of text classification and other text analysis tasks.

Keywords/Search Tags:

Natural Language Processing, Text-Level Classification, Model Compression, Pre-trained Language Model, BERT

PDF Full Text Request

Related items

1	Research On Word-level Ambiguity Resolution Method
2	Research On Compression Techniques Of Pre-trained Language Models
3	Research On Multi-label Text Classification Based On BERT
4	A Transferable Approach To Generating Abstractive Text Summary Based On Pre-trained Language Model
5	The Research And System Design Of Multi-Classification Of Text Sentiment Based On Pre-Trained Models
6	Research On Short Text Classification Technology Based On Deep Learning
7	Research On Text Sentiment Analysis Based On Pre-trained Language Model
8	Research And Application On Method Of Generating SQL Through Natural Language Based On Interactive Information Editing
9	Research On Text Similarity Based On Bert
10	Improvement And Compression Of Pre-Trained Language Models For User-Generated Texts