| In the wake of the flourishing development of deep learning,pre-trained models have garnered noteworthy accomplishments in computer vision and natural language processing domains.However,to fully exploit large-scale unsupervised data,the parameter size of pre-trained models has been increasing,and the computational and memory expenses make it challenging to employ the model on resource-constrained devices.Hence,the issue of compressing and optimizing pre-trained models has become paramount.Pruning,as an essential method for model compression,primarily involves identifying redundant parameters and structures of the model,and eliminating insignificant connections,without considerably undermining the model’s precision,to reduce the computational and memory requisites of the model.Presently,pruning techniques primarily center on the examination of the significance of model connections.On the one hand,they give less consideration to the generalization capacity of the pruned model and the preservation of prior knowledge of the pre-trained model.On the other hand,they lack human guidance during the pruning process.Addressing the aforementioned predicaments,this thesis primarily targets the model compression predicament of the pre-trained BERT model,and the principal contributions are as follows:(1)A subset selection-based BERT pruning method is proposed.Firstly,the attention head selection problem is abstracted into a subset selection problem.The attention weight vector is leveraged to quantify the importance score of attention heads.A sampling method is utilized to retain important attention heads,and the nondifferentiability problem in the sampling process is solved by employing the GumbelSoftmax Trick.Secondly,a meta-learning-based training approach is employed to train the pruned model,which results in faster learning speed and better model generalization ability.Thirdly,the KL divergence is employed to measure the difference in embedding representation distribution generated by the model before and after pruning,to verify whether the prior knowledge learned by the pre-trained model is preserved during pruning.Lastly,a series of experiments are conducted to evaluate the effectiveness of the proposed method in terms of model accuracy,generalization ability,and preservation of prior knowledge.(2)This thesis presents a BERT-based pruning visual analysis system,called BHPVAS which means Visual Analysis System for Pruning Attention Heads in BERT model,that provides interactive exploration to help users generate pruning schemes for pre-trained BERT models and investigate the internal mechanisms of the model.BHPVAS features various attention head evaluation criteria and presents model structure,training data,pruning history,and other information through multiple interactive views.During the pruning process,BHPVAS leverages text dependency relationships and attention weight distributions to investigate the role of attention heads in the model prediction process,facilitating multiple rounds of pruning from the original model and generating model pruning schemes.Finally,the efficacy of BHPVAS is demonstrated through two case studies involving sentiment classification sample analysis and pruning scheme exploration. |