Font Size: a A A

Gene Ontology Terms Labeling With Multi-label Classification

Posted on:2016-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:J R WangFull Text:PDF
GTID:2180330461963143Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of science and technology, researchers have found that gene is the direct reason of disease. Therefore, researchers pay more attention to gene. Actually, a large number of biomedical papers are related to gene. And researchers prefer these papers to be labeled with genetic terms. However, it wastes time and energy by manual analysis. Thus, it is essential to propose a method to label papers with genetic terms automatically.We first present labeling biomedical papers with GO (Gene Ontology) terms using multi-label classification method. Firstly, we search and collect biomedical papers from the most famous Pubmed website using gene terms keyword, meanwhile, we label papers by means of Mesh Subject Headings and then construct training datasets. Secondly, we design multi-label classification system by using training datasets. Then, we introduce the details of the two multi-label classification Algorithms and label the testing datasets, i.e. SCRank-SVM (Simplified Constraints Rank-SVM) and ReguRBFML (Regularized RBF neural network multi-label). Finally, we propose five metrics of multi-label algorithm and evaluate algorithms.According to the relevance of pairs of labels, SCRank-SVM (Simplified Constraints Rank-SVM, SCRank-SVM) firstly define the decision boundary and separating margin of multi-label classification system without bias b, and then the separating margin is maximized, while the Ranking loss function is minimized. The multi-label model eventually can be built by solving a quadratic optimization. Due to the absence of term b, SCRank-SVM has less constraints, therefore it achieves better solution space compared with Rank-SVM.ReguRBFML (Regularized RBF neural network multi-label) mainly addresses the low efficiency of multi-label classification algorithm. In order to decrease the consuming time and achieve high accuracy, this paper extends the traditional RBF neural network to multi-label classification. Through SOM (Self Organization Map) algorithm, we cluster the centers of RBF network, we model the regularized system and apply ridge regression method to solve the weight vector of network. Eventually, we use threshold function to predict label.Experimental results show that our Algorithms achieve more accurate result increased by 3 to 18 percent on biomedical paper datasets and increased by 1 to 6 percent on six open multi-label datasets compared with traditional multi-label Algorithms according to five metrics.
Keywords/Search Tags:GO genetics terms, multi-label classification, Rank-SVM, RBF neural network
PDF Full Text Request
Related items