Font Size: a A A

Research On Medical Dialogue System Based On Text Sequence Generation

Posted on:2024-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:G J YanFull Text:PDF
GTID:2568306920950899Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the progress of social technology,people’s living standards have greatly improved,medical technology has made great progress with the development of technology.More and more people are paying attention to health issues.However,the uneven distribution of medical resources remains a serious problem,and the cost of medical care in terms of time and distance still hinders the daily health needs of many people.In recent years,dialogue systems have been able to assist people in completing more and more tasks,such as booking tickets,ordering meals,and chatting.Combining dialogue systems to help patients complete medical services more quickly and timely has become a feasible direction.The goal of medical dialogue systems is to assist doctors and patients in completing three types of medical services:diagnosis,treatment,and consultation.However,the development of medical dialogue systems is hindered due to the shortage and incompleteness of medical resources.It is generally decomposed into three subtasks:natural language understanding,dialogue strategy learning,and natural language generation.The existing medical dialogue systems mainly have the following problems:firstly,in terms of the dataset,the coverage of medical services is not comprehensive enough,the types of departments included are limited,the labeled information is not comprehensive enough,and the types of diseases and medical entities provided are limited.The lack of this information greatly restricts the services that medical dialogue can provide,making it difficult to expand the scope of the target population and the quality of service.Secondly,in terms of the task,most of the current medical dialogues only focus on one sub-task in task-oriented dialogue,without establishing a complete dialogue task,which results in missing dialogue system functions or a lack of sufficient interpretability.The third manifestation is in the way data is utilized:manual annotation requires huge costs,making it difficult to obtain large-scale manual annotation data.The existing medical dialogue system does not make full use of the limited data resources and does not fully mine the semantic information of medical data.To address the above shortcomings,this thesis’s contributions are as follows:1)This thesis released a multi-service,multi-department medical dialogue dataset ReMeDi for Chinese medical dialogue.The ReMeDi dataset contains 96,965 real conversations between doctors and patients,including 1,557 conversations with fine-grained human labels.The dataset spans 40 medical departments,covering 843 types of diseases,5,228 types of medical entities,and 3 specific medical services.2)This thesis proposes a serialized unified medical dialogue framework,establishing a complete dialogue system sub-task,and unifying the multiple sub-tasks of the dialogue system using a serialized generation method,making it more convenient and efficient to construct a complete dialogue system.3)This thesis proposes a multi-stage medical dialogue learning strategy.In order to fully utilize limited data resources,in the first stage of this strategy,a pseudo annotation algorithm was designed to utilize a large amount of unlabeled data.In the second stage,three natural perturbation methods were designed to expand the manually annotated data.In order to mine more accurate semantic information,the third stage of this strategy constructs positive and negative samples based on the characteristics of medical data,so that unrelated medical entities have greater discrimination in semantic representation.This thesis shows the various characteristics of the constructed dataset,and a comparison with previous relevant datasets shows that the dataset published in this thesis can provide richer information.This thesis demonstrates through experiments that the unified medical dialogue framework has good effects on all three subtasks,and also indicates that the proposed multi-stage medical dialogue learning strategy can further improve the model’s performance on all three subtasks.
Keywords/Search Tags:Dialogue dataset, Medical dialogues, Text sequence generation, Contrastive learning, Data augmentation
PDF Full Text Request
Related items