As an important tool in machine translation,question answering systems and other fields,named entity recognition occupies an important position in the gradual maturity of natural language processing technology.However,due to the lack of annotated corpus data in the named entity recognition task,and the huge cost of largescale data annotation work,this topic is mainly based on the combination of semisupervised learning and deep learning,using a small amount of annotated data to train the model to identify entities.Identify and classify.This article starts from the following aspects:First,for the problem of low recognition rate of entity labels in semi-supervised learning,a method for enhancing semi-supervised label prediction is studied.A data distribution-based KL regularization operator and a confidence-based semi-supervised clustering method are proposed to enhance the label recognition accuracy.The named entity recognition results of the 48,000 newspaper article citation datasets crawled through the web show that the proposed method improves the F1 value by 15%compared with the baseline model,which proves the effectiveness of the proposed method for semi-supervised label prediction enhancement.Second,an improved Tri-training algorithm is proposed to solve the problem of label noise and low training accuracy in the semi-supervised learning Tri-training algorithm.First,a constraint operator based on KL divergence is introduced,the initial error accuracy threshold is set,and the error between data distributions is calculated by KL divergence.When the error is greater than the threshold,the classifier is retrained,thereby reducing the influence of label noise;secondly,A confidence-based improved KL divergence clustering method is introduced,and semisupervised clustering of words with high confidence in unlabeled data is used.Model performance;Finally,in view of the low diversity of classifiers in the Tri-training algorithm,three different classifiers are used to improve the algorithm,so as to increase the generalization ability and diversity of the model,and at the same time improve the classification accuracy.Third,a semi-supervised named entity recognition model based on improved Tritraining algorithm and deep learning Bi-LSTM+CRF model is proposed.Based on the Bi-LSTM+CRF model,a pre-trained language model and a multi-head attention mechanism are introduced to improve the upper limit of model performance.At the same time,in order to solve the problem of lack of a large number of data annotations,an improved semi-supervised learning Tri-training algorithm is introduced to construct semi-supervised named entities Identify the model.The experimental results on the English Co LLn-2003 data set and the Chinese judicial theft judgment document data set show that the F1 value of the model in this paper is increased by5% and 10% compared with the baseline model,and compared with other improved models,the model in this paper has certain competitiveness.Through the comparative analysis of experiments on multiple datasets,it can be seen that the method proposed in this paper has achieved good results in the task of named entity recognition. |