Structured Methods For Pathological Reporting Of Lung Cancer

Posted on:2022-01-01

Degree:Master

Type:Thesis

Country:China

Candidate:X H Wu

Full Text:PDF

GTID:2504306494980529

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

The pathological report of lung cancer contains a lot of information related to the disease,which is of great significance for the prevention and treatment of lung cancer.However,this kind of data is not effectively used because it is stored in the database in the way of natural language without structured processing and cannot be recognized by the computer.Structured work of pathological report will provide data support for medical-related academic research.However,existing structured research work often focuses on the improvement of theoretical algorithms,and it is difficult to meet the needs of engineering applications through a single algorithm.This paper aims to construct a comprehensive extraction framework by model plus rules,based on the existing theoretical model,improve the accuracy of lung cancer pathology report structuring,and achieve practical usability in engineering.The research contents of this paper mainly include the following aspects: a comprehensive extraction framework was constructed by selecting the lung cancer diagnosis report data from a third-class A hospital in Shanghai,and various entities in the lung cancer pathology report were extracted by model plus rules.In terms of model design,the CRF model based on Bert was designed and selected as the optimal model by embedding all kinds of algorithms and features with contrast words to learn various models.In terms of rule setting,regular expression was used to match the entity type of "degree of differentiation",and the longest text in the matching result was selected as the output result.Experimental results: The overall accuracy of the final model was 0.987,and the prediction accuracy and recall rate reached 1 for multiple entity types.Finally,the backtracking module is designed.This module supports viewing and counting erroneous and unpredicted entity types,thus providing data support for improving and verifying the performance of entity type extraction.By means of entity type reconstruction,annotated corpus modification and rule-based information extraction,three problems such as unreasonable entity type setting,wrong labeling and too few training samples of some entity types were solved,which effectively improved the performance of prediction.The research work of this paper mainly makes two contributions.In terms of theory: a comprehensive extraction framework of rule plus model was constructed.Firstly,the optimal model was selected based on different word embedding methods word2 vec and BERT.Then,the entity types that appeared less frequently in the corpus were extracted by rules to further improve the prediction accuracy and recall rate on multiple types of entities.Practically: The traceability module is built to support the view of the error predicted and unpredicted entity types,so as to achieve the purpose of analysis and verification.And support corresponding to the original pathology report,easy to modify the mislabeled data,improve the quality of the sequence labeling corpus.

Keywords/Search Tags:

Pathological Report, Named Entity Recognition, Attention Mechanism, Bert, Bi-LSTM, CRF

PDF Full Text Request

Related items

1	Research On Chinese Named Entity Recognition In Medical Field
2	Research On Named Entity Recognition Of Biological Pathogens Based On Neural Networks
3	Medical Named Entity Recognition Research Based On Deep Learning
4	Research And Implementation Of Medical Entity Recognition System Based On Double BiLSTM
5	Research On Named Entity Recognition And Entity Relationship Extraction Of Medical Data Text Based On Attention
6	Research On Named Entity Recognition In TCM Medical Records Based On BERT Pre-training Mode
7	Named Entity Recognition Of Electronic Medical Records Based On Deep Learning
8	Research On Method Of Medical Named Entity Recognition Based On Pre-trained Model
9	Deep Learning Based Medical Named Entity Recognition
10	Research And Application Of Key Techniques For Named Entity Recognition Of Electronic Medical Records Based On Deep Neural Network