| In recent years,the rapid development of NLP has led to the integration of many downstream applications into all aspects of life.NER,as an important task in NLP,its recognition performance is essential for many downstream applications such as knowledge graph,machine translation,question and answer systems,etc.For the NER task,there are still some problems in existing state-of-art methods:(1)The lattice-based model introduces lexical-level features on top of word-level,which improves the ability of model to recognize entity boundaries,but more hybrid features as well as external prior information can be introduced to further improve the model performance.(2)The MRC based model introduces prior information for the NER task and improves the performance of entity recognition,but does not consider the important role of contextual information and hybrid features in the MRC model.In this thesis,the research is carried out from the perspective of multidimensional hybrid feature representation and the MRC approach to solve NER,and the main research works are as follows:(1)An improved lattice-based NER approach,SLBERT,is proposed.The approach aims to introduce prior information to the lattice model using the MRC approach,thus improving the model performance.In the problem modeling stage,a natural language query is constructed for each entity and spliced with the original text,and the constructed natural language problem is the introduced prior information.In the feature extraction stage,the sentence features of the text are extracted and fused with word and lexical features to encode more dimensional features into the feature matrix for subsequent Token classification.By introducing external prior information and incorporating more hybrid features to the model,the performance of named entity recognition is improved.(2)A novel MRC-based NER method,GFMRC,is proposed,which enhances the MRC model with contextual information and hybrid features to improve the model performance.In the preprocessing stage,the samples of the initial MRC dataset are spliced with N-gram information,i.e.,context is added to each sample.In the feature extraction stage,based on the features obtained from the MRC coding layer,global features are extracted using CNN and local features are extracted using LSTM for each Token separately,and the three features are fused.Moreover,in the Token classification stage,it is proposed to use a multi-task approach to filter negative samples to further reduce the possibility of model misclassification.The performance of entity recognition is improved by introducing context as well as hybrid features for MRC.(3)Based on the above proposed algorithm,a medical intelligent question and answer system is constructed.Main functional modules include intelligent Q&A,user evaluation,question management and question statistics.The system can intelligently analyze the questions input by users and search the knowledge graph to automatically return the answers corresponding to the questions.At the same time,the system provides a user feedback port to receive user comments on intelligent answers.The system also implements quantitative statistics to count the types of questions and answers with low quality of intelligent answers,which can be targeted to improve the system performance.In the intelligent question and answer module of the system,the named entity recognition algorithm proposed in this thesis is used for entity recognition,which further improves the model’s ability to understand user questions by identifying named entities more accurately and proves the practicality of the proposed algorithm from the perspective of practice. |