Font Size: a A A

Semi-self-supervised Learning Method Based On Semantic Text Similarity Of Small Sample Electronic Medical Record

Posted on:2023-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:B HuangFull Text:PDF
GTID:2544306620971199Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Semantic text similarity calculation is a fundamental common problem in research work such as information retrieval,text clustering,semantic disambiguation,automatic question answering,etc.The main content of its research is to measure the degree of similarity between texts.The development of medical information has accumulated a large amount of electronic medical record text data.Applying the method of semantic text similarity research to these text data will help to promote the development of medical information.Using traditional methods to study semantic text similarity tasks in the medical field mainly acquires shallow-level information from texts,while deep learningbased methods can acquire deep-level textual semantic information,but these methods are highly sensitive and dependent on labeled data.This thesis aims to overcome the dependence of the language model on the labeled data in the small sample electronic medical record text similarity task and realize that the model can obtain not only good scores on small sample data but also obtain high-quality sentence representations.In view of the limitations caused by the strong professionalism of medical data,conservative data sets,and small data volumes,this thesis proposes two solutions for different data scenarios.(1)Propose a Multistage Bidirectional Cross Distillation Encoder(MBCDE)model suitable for unsupervised learning.The MBCDE model uses an improved self-supervised learning approach to unsupervised training of pre-trained language models on corpora in the medical field,which can obtain higher-quality sentence representations in the medical field.Using the different performance of bidirectional encoder and cross-encoder on semantic text similarity tasks combined with the method of model distillation,a method of the bidirectional cross-distilled encoder is proposed.The MBCDE model fuses the prediction results of different types of encoders to obtain the final prediction result,which has strong robustness and high quality.(2)A Bidirectional Cross-Dynamic Round Robin Learning Encoder(BCDRRLE)model suitable for semi-supervised learning is proposed.BCDRRLE uses the dynamic polling learning mechanism proposed in this thesis to update the labels on the unlabeled data set and applies the unlabeled data to the training process of the model.The learning of the model will also affect the unlabeled data.Using this polling learning mechanism not only expands the data volume of the task data but also makes the results of the model more excellent.The experimental results show that the MBCDE model using the unsupervised method outperforms the supervised method on the three electronic medical record semantic text similarity task datasets,and the BCDRRLE model using the semisupervised method achieves significantly better results than other models.The research method proposed in this thesis provides a solution to the semantic text similarity problem of small sample electronic medical records and also provides a reference for other conservative fields to solve the problem of relying on labeled data.
Keywords/Search Tags:Small sample, Electronic medical record, Deep learning, Semantic text similarity, Self-supervised learning, Pre-trained language model
PDF Full Text Request
Related items