Semi-Supervised-Based Named Entity Recognition And System Application For Drug Patents

Posted on:2020-06-25

Degree:Master

Type:Thesis

Country:China

Candidate:Z Wang

Full Text:PDF

GTID:2404330596982637

Subject:Control engineering

Abstract/Summary:

PDF Full Text Request

With the vigorous development of life science,the related literature in the field of pharmaceutical shows an exponential growth trend.Extracting structured and organized information of the compound from these massive unstructured medical literatures will effectively help the researchers in both pharmaceutical and related field to carry out studies,and then promote technological innovation in drug industry.Among them,the chemical named entity attracts considerable attention concerned by the professionals,which acts as the main carrier for information analysis of the literatures.Therefore,the related named entity recognition has become an important research topic.Among the existing NER methods,the Long Short Term Memory with a Conditional Random Field layer(LSTM-CRF)is one of the most advanced and commonly deployed approach.However,this supervised learning method usually requires a large number of labeled corpus,which is very limited for some professional fields,such as the drug patent studied in this paper.In such a case,the supervised learning model cannot accurately tag the corresponding entities.In order to overcome the above shortcomings,a semi-supervised named entity recognition approach is proposed in this study,which is based on the combination of bidirectional long-term memory network and word similarity as well as conditional random field layer(BiLSTM-WS-CRF).Firstly,the vector representations of the words contained in each label are clustered to obtain the clustering center which is regarded as the representatives of the label,and the appropriate similarity measurement method is selected to measure the relationship between each input word and different labels to generate the corresponding vector representation.Then,the expression of the vector is combined with the output of hidden layer of BiLSTM to calculate the confidence score.Finally,the score is input to the CRF layer to obtain the predicted tag that conforms to the marking strategy.In this way,the proposed model not only introduces the unsupervised learning characteristics to guide the tagging process,but also preserves the advantages of the supervised BiLSTM-CRF model that takes into account both the long-short-term dependencies among input sequence and dependencies between labels.Experimental study shows that,comparing with the traditional baseline model and other commonly deployed semi-supervised methods,the proposed method has obvious advantages in named entity recognition task in pharmaceutical and other professional fields.Aims at facilitating the related researchers to read and analyze the literature,this study further designs a system software for named entity recognition on drug patents,which realizes a series of functions including text processing,word vector training,named entity recognition,entity visualization,etc.It provides supporting information for medical research as well as be beneficial for accelerating the drug development process.

Keywords/Search Tags:

Named Entity Recognition, Drug patents, Semi-supervised Learning, Word similarity

PDF Full Text Request

Related items

1	Research And Implementation Of Chinese Named Entity Recognition Algorithm For Medical Field
2	Research On Named Entity Recognition Technology For TCM Field
3	Research On Chinese Medical Text Named Entity Recognition Based On Semi-Supervised Multi-Feature Model
4	Research On Medical Entity And Assertion Recognition Based On Active Learning And Semi-supervised Learning
5	A Comparative Study Of Named Entity Recognition In The Recognition Of Traditional Chinese Medicine Nouns And Prescription Nouns
6	A Method Of Medical Named Entity Recognition Based On Limited Labels
7	Named Entity Recognition In Medical Field Based On Deep Learning Of Chinese
8	Research On Named Entity Recognition And Entity Relationship Extraction Of Medical Data Text Based On Attention
9	Deep Learning Based Medical Named Entity Recognition
10	Research On Method Of Medical Named Entity Recognition Based On Pre-trained Model