| In recent years,deep learning techniques have made great strides in natural language processing.However,most deep learning is accomplished by learning from a large number of labelled samples to build a model for the target task.However,in some cases,obtaining sufficient training samples becomes very difficult due to data privacy concerns.In addition,labelling a large number of samples can be very time-consuming and costly.In order to solve the natural language processing problem in small sample situations,small sample learning methods have been proposed.However,most of these small-sample learning models have a large number of parameters and high model complexity,which require high computer computing power.In this thesis,we propose a small-sample learning method based on the MLM(Masked Language Model)model.Simulation results on several datasets show that this method outperforms classical machine learning and deep learning methods for small-sample learning tasks.The specific research work in this thesis is as follows.(1)A small-sample learning training method,FPT-MLM(Few-shot Pattern Training based on MLM),is proposed to accomplish a small-sample natural language processing task.The method selects a small portion of tokens to mask at a time,then trains repeatedly on the same sample to obtain a training model with bi-directional fused information,then converts the test set into a fill-in-the-blank probability problem with appropriate prefixes or suffixes,and then decodes using the decoding layer to finally obtain the probability of predicted tag sequences to achieve natural language recognition,classification and analysis.(2)The proposed FPT-MLM method is applied to patent text entity recognition.To address the problems of lack of sufficient annotation data in the field of patent text and the low accuracy of traditional entity recognition methods in the case of a small number of samples,a small number of patent abstract texts in the field of pressure sensor preparation were selected and manually annotated by means of BMEO annotation,and an experimental corpus of Chinese patent abstracts with labels was established.Using the FPT-MLM method proposed in this thesis,the entity recognition task in a specific patent domain was completed,and the recognition results were superior in terms of accuracy and F-value compared with machine learning methods and deep learning methods.(3)The proposed FPT-MLM method is applied to small-sample sentiment analysis and short text classification,and the experimental results show that the method can effectively reduce the model complexity and maintain the benchmark efficiency. |