| In recent years,with the advances in biomedical experimental methods,experimental data and literature resources have grown exponentially in biomedieal seienee.How to quick-ly and effectively extract valuable information from such huge scientific literature repository is an urgent problem to be solved.In biomedical research,entity recognition and normal-ization(such as genes/proteins,chemicals and diseases)is the foundation of biomedical text mining.It has important significance on the extraction of biomedical entity relations and the establishment of biomedical knowledge bases.Among them,disease named entity recogni-tion and normalization is to automatically extract disease names from biomedical literature and link them to a designated disease database.On this task,this paper performs research work in three aspects:(1)Disease named entity recognition based on syntactic and semantic features.Aiming at tackling the problems existing in the current disease named entity recognition,based on the conditional random field model,a series of new syntactic features and semantic features were proposed to obtain the structural information of the disease names in the sentence and the semantic information in the database.The experimental results show that the proposed features can achieve better results in the disease named entity recognition task.(2)Disease named entity recognition based on deep learning.In order to alleviate the problem of feature sparsity in conventional machine learning,this paper adopts a state-of-the-art deep learning model:BILSTM-CRF to perform disease names recognition,and also explores the influence of various syntactic and semantic features on disease name recogni-tion.The experiments indicate that the deep learning model also achieves results comparable to the state-of-the-art performance.(3)Disease named entity normalization based on context information.The normaliza-tion of disease names is cast as a classification task.Firstly,based on the characteristics of disease names,two fuzzy matching algorithms are used to generate candidate sets,which improves the recall rate of candidate sets.Then,context information are integrated into neu-ral network for the eandidate set disambiguation.The experimental results show that the method adopted in this paper achieves good performance in the field of disease normaliza-tion. |