Font Size: a A A

Extraction And Analysis Of Relationships Between Drugs And Diseases From Medical Literature

Posted on:2011-11-09Degree:MasterType:Thesis
Country:ChinaCandidate:M Z WuFull Text:PDF
GTID:2144360305458349Subject:Information Science
Abstract/Summary:PDF Full Text Request
ObjectiveThe rapid progress of biomedical research and emergence of electronic data bring people into the sea of data, which has become the bottleneck for exploring relationships among biomedical entities. In clinical, meanwhile, deaths due to adverse effects and sharply increased inpatient and outpatient costs due to improper medication have become the main problem of rational use of drugs. In this study, we take the Mubaid's text mining method based on statistical analysis as basis. After suitable adjustment, we extract the relationships between aspirin and diseases entities, as well as that of cisplatin and diseases, through which to conforme whether Mubaid's method could be used in this field. We also expect to find valuable information on adverse drug reactions from biomedical literatures, thereby to warn such adverse events effectively, and provide technical reference for rational use of drugs in clinical, and then make a better safeguard of public health.Materials and methodsThis study was on the basis of Mubaid's text mining method based on statistical analysis. Statistical significance of each disease concept occurred in literatures of durg adverse effects, was calculated using the parameters of theoretical value, true value and Z-score, thus, we got the significant co-occurrence between disease and adverse effect concepts. The detailed method was as follows. We retrieved literatures of adverse effects of aspirin and cisplatin from PubMed database as the study groups, and counted the frequency of disease concepts in the literature collections, respectively. Two baseline groups were set up to compare the difference of disease concepts between the study group and baseline group. Baseline one excluded the papers about the study topic (aspirin and cisplatin) and papers that were discussing the super class subject of the study topic (salicylic acids, chlorine compounds, nitrogen compounds and platinum compounds) were excluded from baseline two. In order to distinguish the extracted results from diffenent fields, two technical routes were used:analysis of MeSH terms and nature language analysis. That is, with document frequency-based and term frequency-base parameter calculation methods, theoretical value, true value and Z-score of some biomedical entity concept in the study group literature collection were calculated, to compare the frequency difference between the study group and baseline groups. To evaluate the effect of extraction method, concepts with high Z-score or only occurred in study group were analyzed to determine which concepts were the therapeutic use or side effects of aspirin and cisplatin that were included in the authority website, pharmacopoeia, textbooks and drug instructions. We determined the drug effects of concepts that were not included in such authority standards by reading corresponding literatures. Receiver operating characteristic curve analysis compared the difference of outcome between different baseline groups and between different parameter calculation methods, respectively.ResultsNew extracted therapeutic applications and adverse reaction rates in the two baselines of aspirin were 36.6% and 36.7%(48/131,47/128), respectively. The rates of such new relationships extracted from cisplatin group were 51.1%(68/133,69/135), respectively. Chi-square test showed that there was no difference between the mining results between the two baseline groups. There were some differences between the outcome of aspirin and cisplatin, indicating that domain specificity was existed for this extraction method. The proportions of new relation extracted from the concepts only appeared in study group were 40.43%,47.83%,59.57% and 56.82%, respectively. Receiver operating characteristic curve analysis showed that, in aspirin group, baseline one was superior to baseline two and the results of document frequency-based parameter calculation was better than that of term frequency-based parameter calculation. While in cisplatin group, there were no consistent results between two baseline groups and between two parameter calculation methods. However, there were no significant differences between the compared groups.ConclusionsOur study showed that this co-occurrence and statistics-based text mining method could extract the relationships between aspirin and diseases entities, as well as that of cisplatin and diseases, which conformed that Mubaid's method could be used in this field. Our results also extracted some therapeutic uses and side effects of aspirin and cisplatin, which were not included in the authority website, pharmacopoeia, textbooks and drug instructions. We got the valuable warning information that was able to provide technical reference for rational use of drugs in clinical.
Keywords/Search Tags:biomedical entity, relation extraction, text mining, aspirin, cisplatin, receiver operating characteristic curve
PDF Full Text Request
Related items