Font Size: a A A

Study On Multi-label Classification Of The Medical Dispute Judgment Documents

Posted on:2020-10-11Degree:MasterType:Thesis
Country:ChinaCandidate:L YangFull Text:PDF
GTID:2404330623459908Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As a valuable material,the medical dispute judgment documents(MDJD)play an important reference role in solving doctor-patient contradictions.If similar cases can be accurately recommended to users for reference,it can effectively improve the perceiving of responsibility and compensation between doctors and patients,as well as improve the quality and efficiency of mediation.In the recommendation process,documents can be classified into key categories such as “department” and “medical negligence behaviors” in advance.In this way,historical cases that are not related to the user input case can be quickly and accurately filtered,which not only improves the accuracy and efficiency of the document recommendation,but also reduces the size of the case set that requires similarity calculation."Departments" and "medical negligence behaviors" are the two most important multi-label categories in MDJD.There are two main problems during the research: first,low proportion and scattered distribution of the content related to the classification themes in the document lead to the direct application of existing text classification algorithms can easily cause the corresponding feature of key information not to be prominent or even to be neglected.Secondly,the unbalanced distribution of classes among labels makes the results of classifiers unsatisfactory.To cope with above problems,the key information is extracted from the original text to and the content outline is generated to simplify the information;then based on the content outline generated,resampling and ensemble is combined to solve the class imbalance problem in multi-label classification and improve the performance.The specific research work includes the following aspects:(1)Content outline generation based on word granularity(keyword extraction).The key noun phrases have good distinguishing ability in the "department" classification task.Therefore,text extraction based on word granularity is adopted to generate content outline.Due to the lack of semantic information in the application of Bi LSTM-CRF to Chinese character-level sequence annotation,an improved Bi LSTM-CRF model is proposed,which is used to identify the key phrases related to the "department" classification task.The experiment shows that the improved Bi LSTM-CRF model can improve the performance of keyword extraction.(2)Content outline generation based on sentence granularity(extractive summary generation).For the "medical negligence behaviors",noun phrases can not completely expressnegligence behavior any more.Therefore,key sentences is extracted to generate content outline.Most of the existing extractive summary models adopt encoder-decoder model or consider the redundancy between sentences,which cause poor performance in the application.Therefore,a attention-based hierarchical Bi LSTM model is proposed to complete sentence extraction.The experiment demonstrates that the attention-based hierarchical Bi LSTM model can improve the quality of the content outline to a certain extent and filter the information efficiently.(3)Multi-label classification based on content outline.For the problem of class imbalance in multi-label classification,an improved comprehensive sampling algrithm(RCS)is proposed,and a ensemble multi-label classification algorithm RCS-Bagging is proposed further by combining RCS with the Bagging algorithm.Specifically,RCS is used to resample the content outline to generate multiple different sample sets,then multi-label classifiers are trained based on each sample set,and finally these classifiers are combined with a certain strategy,so as to reduce the impact of class imbalance on the effect of multi-label classification.The experiment shows that the RCS-Bagging algorithm that combine base classifiers by one vote strategy effectively improves the recall and f1-score in the multi-label classification tasks,while the hamming loss is reduced.Therefore,the feasibility and effectiveness of the solution proposed for MDJD multi-label classification is proved.
Keywords/Search Tags:multi-label classification, medical dispute, content outline, sequence label, class imbalance
PDF Full Text Request
Related items