| Semantic representation models are the cornerstone of models in the natural language processing domain and influence the merit of linguistic features for downstream tasks in the natural language processing domain.Early semantic representation models extracted fixed word vector features in fixed utterances.Such semantic representation models achieved good results in the early stage,but there is multi-meaning among the semantics of words,a word has different meanings in different contexts,so the processing of the same word in different contexts is not effective.In recent years,the semantic representation model has made great progress,and on the basis of the original semantic representation model,transfer learning has been added,and the pre-training-fine-tuning training paradigm has been used,and good results have been achieved in solving the semantic polysemy of words.Later,since the pre-training-fine-tuning training paradigm is widely used for semantic representation models,they are also referred to as pre-training models for new semantic representation models.The most important module of BERT model is multi-headed attention,and this paper proposes fused multi-headed attention based on multi-headed attention.This paper proposes fused multi-head attention based on multi-head attention,which fuses the highlevel features and low-level features in multi-head attention to allow the model to fully learn the semantics in sentences.In this paper,the fused multi-headed attention is acted in the BERT model,and it is called the fused multi-headed attention model(CMH-BERT model).Experiments are conducted on six publicly available Chinese datasets and its performance is verified to be effective,with 0.79% accuracy improvement on the lcqmc dataset,0.98% accuracy improvement on the douban dataset,and outperforming the BERT model on the other three datasets with limited improvement space except for the chnsenticorp dataset.Since the fused multi-headed attention proposed in this paper is applied in the fine-tuning phase,it does not require retraining of the BERT model to achieve the best performance,which improves the training efficiency of the model.For pre-trained models,usually the model parameters are too large and the running time is very long,so adaptive inference is a good method for model acceleration.Fast BERT model is to add adaptive inference to the BERT model and apply knowledge distillation technique for training.Adaptive inference takes simple statements for fast response and deep learning for difficult statements.For adaptive inference,reducing the response time of simple utterances and decreasing the number of difficult utterances is an effective way to speed up,so this paper proposes a spatial loss function based on adaptive inference,which acts on the Fast BERT model and is called the Fast BERT-SLAI model.BERT model with a speedup of 4-19 times and a performance reduction of 1%-3%accuracy,and compared to Fast BERT model with a speedup of 2-10 times and a performance improvement of 0.05%-1%.The effectiveness of the spatial loss function based on adaptive inference is experimentally demonstrated.In this paper,the CMH-BERT model is analyzed in terms of both model parameters and computational effort.Since adding modules to the backbone model,i.e.,the BERT model,will inevitably lead to a larger model and more computation,this paper adds an adaptive mechanism to the CMH-BERT model to reduce the computation of the CMHBERT model and calls it the CMH-BERT model based on adaptive inference(CMHBERT-SLAI model).The CMH-BERT-SLAI model is experimented on six publicly available Chinese datasets,and compared to the CMH-BERT model,the CMH-BERTSLAI model is 5-11 times faster and 1%-2% less accurate in performance.Compared to the Fast BERT-SLAI model,the CMH-BERT-SLAI model improves performance by0.1%-0.9% accuracy,but is twice as computationally intensive.The experiments show that the CMH-BERT-SLAI model is a balance of speed and accuracy. |