| The rapid growth of public data in chemical and biological fields,provides more opportunities for people in the field of drug research and development for large data mining.These data sets are increasing in scale and complexity,which makes it difficult to express and store the data.The data is represented as Linked Data,and it can promote the integration of data sets and other web resources.RDF can turn data into a machine readable form,and it can be used to express more information in an extended vocabulary.By integrating these data and mining these data,we can analyze the complex characteristics of the drug.Based on the drug network analysis of the bioactivity datasets,the discovery of drug combined with complex network analysis is an invisible trend of modern drug discovery technology.As for the processing of large-scale data and graph analysis,the asynchronous parallel computing framework of GraphLab shows good performance.In the distributed environment,we can construct and analyze the model of the large scale data.In this paper,we propose a distributed graph model construction system for the semantic dataset based on GraphLab.In the above system,by applying the nodes similarity algorithm based on attribute co-occurrence to ChEMBL database developed by European Bioinformatics Institute,we construct the bipartite graph based on “compound-target ” network.Then,with the framework of Graph Lab,we calculate the natural product similarity based on activity.In order to give instruct to the inspection experiment of activity,we recommend activities between natural products with high similarity degree.At last,the recommendations benefit the discovery and selection of drug target during the early period of drug research and development. |