| The number of biomedical data is growing explosively, such vast amounts of data brings abundant theoretical support for biomedical scientists researching new drugs, but even if they read the literatures day and night, they will not read all, let alone extract hidden information. So, the system of auto-extracting and analyzing information from biomedical data is more and more important. Meanwhile, with the development of biomedical study, the single data source can already not meet the increasing information needs so auto-discovery relationship model from heterogeneous data becomes very important in biomedical domain. The dissertation mainly studies emotional relationships between biological ontologies in biomedical literature, the potential relation extraction, as well as heterogeneous data integrationWith the number of information format stored increasing,the information drawn from single data source has been already unable to meet the information needs of researchers, thus scientific databases and scientific literature are required to achieve data integration, to discovery knowledge across the heterogeneous database. The dissertation studies two latent semantic analysis models, namely the Latent Semantic Analysis model based on results integration, and Latent Semantic Analysis model based on data integration. The former first analyzes data source, then integrates all results.And the latter integrates intermediate results to a new data set, and then continues analysis. The experiment verifies the feasibility and effectiveness of the two methods.The dissertation uses graph-based semi-supervised learning algorithm, label propagation method,to automatically identify the relationship between biological entities. Extracting sentiment relationships between entities from the text automatically is an important direction in the field of text mining. Currently, supervised learning method is used in most of the studies, and usually performance nicely, but a large number of labels are required as sample set of training data, which will cost a lot of manpower and time, so that reducing efficiency. The label propagation method passes tag information from any node in the figure to other neighboring nodes by weighted edge recurrently, eventually reaching global stability so as to deduce the information data on not label node. And it can improve learning performance when the training data is not enough.In this dissertation, context-based ABC model is used to discover the multi-level potential relationship entities, and the non-correlation data sets, the relationship of disease-gene and gene-drug, is used as data source instead of traditional construction method, the relationship between disease-drug directly, to analyze more comprehensive potential relationship between disease and drug. |