Font Size: a A A

Research On Medical Literature Retrieval Based On Random Forest Algorithm

Posted on:2019-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:G G ZhuangFull Text:PDF
GTID:2428330572455299Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continuous advancement of modern medical level,the number of medical literature has also grown rapidly.It is becoming more and more difficult to find the target medical literature that is most relevant to the patient's symptom information in massive medical literature.The main difficulty lies in the incompleteness of patient's symptom information and different diseases have similar symptoms.The research direction of this dissertation is medical literature retrieval research oriented to clinical decision support services.The main content of the study is how to find the target medical literature which is most related to the patient's symptom information from a given collection of medical documents,so as to help the clinician give the diagnosis of the disease and achieve the purpose of the clinical decision support.Therefore,the main research content of this dissertation focuses on the following several aspects:Firstly,in dealing with patient's symptom information,this dissertation proposes a keyword co-occurrence method to deal with patient's symptom information.The key words of each document in the literature collection are extracted to construct lit erature keywords set.Then the medical terms in the Me SH standard lexicon are extracted and the medical terminology set is constructed.Then,the abbreviations are extracted and normalized in each document of the medical literature collection,thus the abb reviated word set is constructed.According to the collection of document keywords,Me SH medical terminology set and abbreviation set,eventually this dissertation form a set of standard keywords.Based on the set of standard key words,and using the method of keyword co-occurrence to scan the patient's symptom information to extract the key words with higher value,the patient's symptom information is optimized and the query optimization is completed.Secondly,in order to optimize the ranking of search results,this dissertation makes use of the random forest algorithm to establish the prediction model of the relevance degree between the query and the literature.According to the selected features,the relevance degree between the query and the literature is predicted.The degree of correlation between patient's symptom information and literature is divided into three levels,namely "definitely relevant","potentially relevant" and "definitely not relevant ",and the corresponding correlation values are 2,1,and 0 respectively.Experiments show that compared with the bas ic model,the evaluation result of the re-ranking results obtained by the ra ndom forest algorithm model has been improved.Thirdly,in extracting the features of the query and the document,this dissertation not only extracts four basic similarity features.The citation network was also constructed by finding out the citation relationships in the collection of documents.According to the citation network,the Page Rank algorithm is used to calculate the Page Rank value of the literature in the collection.In addition,the HITS algorithm was applied to calculate the Authority value of the literature in the collection.Then,the paper adds the Page Rank value and Authority value calculated from the citation network as features to the random forest model.Experiments show that after adding these two features,the retrieval performance of medical literature has been improved to some extent,and each evaluation index has also been improved to varying degrees.Finally,this dissertation gives detailed and necessary explanations for the relevant experimental methods and experimental results.
Keywords/Search Tags:C linical Decision Support, Random Forest, Query expansion, Lucene, Literature Retrieval
PDF Full Text Request
Related items