Semi-self-supervised Learning Method Based On Semantic Text Similarity Of Small Sample Electronic Medical Record

Posted on:2023-01-05

Degree:Master

Type:Thesis

Country:China

Candidate:B Huang

Full Text:PDF

GTID:2544306620971199

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Semantic text similarity calculation is a fundamental common problem in research work such as information retrieval,text clustering,semantic disambiguation,automatic question answering,etc.The main content of its research is to measure the degree of similarity between texts.The development of medical information has accumulated a large amount of electronic medical record text data.Applying the method of semantic text similarity research to these text data will help to promote the development of medical information.Using traditional methods to study semantic text similarity tasks in the medical field mainly acquires shallow-level information from texts,while deep learningbased methods can acquire deep-level textual semantic information,but these methods are highly sensitive and dependent on labeled data.This thesis aims to overcome the dependence of the language model on the labeled data in the small sample electronic medical record text similarity task and realize that the model can obtain not only good scores on small sample data but also obtain high-quality sentence representations.In view of the limitations caused by the strong professionalism of medical data,conservative data sets,and small data volumes,this thesis proposes two solutions for different data scenarios.(1)Propose a Multistage Bidirectional Cross Distillation Encoder(MBCDE)model suitable for unsupervised learning.The MBCDE model uses an improved self-supervised learning approach to unsupervised training of pre-trained language models on corpora in the medical field,which can obtain higher-quality sentence representations in the medical field.Using the different performance of bidirectional encoder and cross-encoder on semantic text similarity tasks combined with the method of model distillation,a method of the bidirectional cross-distilled encoder is proposed.The MBCDE model fuses the prediction results of different types of encoders to obtain the final prediction result,which has strong robustness and high quality.(2)A Bidirectional Cross-Dynamic Round Robin Learning Encoder(BCDRRLE)model suitable for semi-supervised learning is proposed.BCDRRLE uses the dynamic polling learning mechanism proposed in this thesis to update the labels on the unlabeled data set and applies the unlabeled data to the training process of the model.The learning of the model will also affect the unlabeled data.Using this polling learning mechanism not only expands the data volume of the task data but also makes the results of the model more excellent.The experimental results show that the MBCDE model using the unsupervised method outperforms the supervised method on the three electronic medical record semantic text similarity task datasets,and the BCDRRLE model using the semisupervised method achieves significantly better results than other models.The research method proposed in this thesis provides a solution to the semantic text similarity problem of small sample electronic medical records and also provides a reference for other conservative fields to solve the problem of relying on labeled data.

Keywords/Search Tags:

Small sample, Electronic medical record, Deep learning, Semantic text similarity, Self-supervised learning, Pre-trained language model

PDF Full Text Request

Related items

1	Research And Implementation Of Chinese Electronic Medical Record Text Semantic Segmentation Method Based On Deep Learning
2	Research On Medical Knowledge Extraction In Electronic Medical Records Based On Deep Learning
3	Research And Implementation Of Chinese Electronic Medical Record Named Entity Recognition Based On Deep Learning
4	Research Of Intelligent Hepatopathy Auxiliary Diagnosis System Based On Text Semantic Analysis Of Electronic Medical Records
5	Study On TCM Clinical Decision Support Based On Electronic Medical Record
6	Semantic Segmentation Of Cataract Surgery Images Based On Semi-supervised Deep Learning
7	Named Entity Recognition Of Electronic Medical Records Based On Deep Learning
8	Research On The Key Issues Of Small Sample Classification And Class Imbalance Classification In Medical Image Aided Diagnosis
9	Research On EMR Diagnosis Model Based On Deep Learning Integrated With Lexical Semantic
10	Recognition And Analysis Of Medical Records Based On Deep Learning