Font Size: a A A

Research Of Chinese Medicine Terminology Recognition Based On Deep Learning And Active Learning

Posted on:2020-03-17Degree:MasterType:Thesis
Country:ChinaCandidate:H LiFull Text:PDF
GTID:2404330623456441Subject:Engineering
Abstract/Summary:PDF Full Text Request
Ancient Chinese medicine books contain a wealth of knowledge of clinical Chinese medicine experience.By using the natural language processing technology to study ancient Chinese medicine books,we can deeply explore the text knowledge in the field of Chinese medicine and promote the integration and innovation of Chinese medicine knowledge.Named entity recognition is an important natural language processing technique,which could identify named entities from the text to help people quickly understand the semantic information in the text and get relevant knowledge.Chinese medicine terminology recognition is the application of named entity recognition technology to ancient Chinese medicine books.Through recognizing Chinese medicine terminologies from the texts of ancient Chinese medicine books expediently,Chinese medicine terminology recognition would help researchers promote the process of researching Chinese medicine books and provide support for research fields such as text mining and information retrieval in Chinese medicine.The grammar of ancient Chinese medicine books is unique and flexible,which makes it difficult to identify Chinese medicine terms.At present,the research on Chinese medicine terminology recognition is very deficient,so how to use advanced named entity recognition technology to solve the problem of Chinese medicine terminology recognition is an urgent task.At present,the deep learning model has achieved remarkable results in the field of natural language processing such as named entity recognition.Related research on applying deep learning model to Chinese medicine terminology recognition is very scarce,thus the BERT-BiLSTM-CRF model based on deep learning is proposed and designed for Chinese medicine terminology recognition.The design process of the BERT-BiLSTM-CRF model fully combines the transfer learning strategy,the pretrained language model,and the classic machine learning model.In the experiment,the BERT-BiLSTM-CRF model shows excellent performance compared with various benchmark models.The deep learning model requests a huge number of labeled samples,while the cost of manually labeling ancient Chinese medicine books as training samples is very high in the Chinese medicine terminology recognition task.Therefore,the author studies how to apply the active learning algorithm to the model and task of Chinese medicine terminology recognition.In order to reduce the number of labeled samples required by the BERT-BiLSTM-CRF model and decrease the cost of manually labeling samples,the active learning algorithm is designed for Chinese medicine terminology recognition.In the experiment,the designed active learning algorithm achieves the purpose of significantly reducing the cost of manually labeling samples in Chinese medicine terminology recognition.In addition,considering the predictions of terms in the sample are not fully considered when the conventional active learning algorithm applied to the Chinese medicine terminology recognition task,the active learning algorithm based on entity granularity is proposed and designed,which is more suitable for Chinese medicine terminology recognition and other named entity recognition tasks.The active learning algorithm based on entity granularity and the benchmark active learning algorithm are compared in the experiment.The result shows that the active learning algorithm based on entity granularity can further cut down the number of labeled samples required by BERT-BiLSTM-CRF and further diminish the cost of manually labeling samples.
Keywords/Search Tags:Natural language processing, Named entity recognition, Chinese medicine terminology recognition, Deep learning, Active learning
PDF Full Text Request
Related items