Font Size: a A A

Fuzzy Integral For Prediction Of The Protein Function

Posted on:2013-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhaoFull Text:PDF
GTID:2230330371483126Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the post-genome era, the research hotspot of bioinformatics has shifted to theproteomics. Proteins are biomacromolecules and all of the activities of living organisms arerelated to the proteins. The research of protein function is useful to the diagnosis andtreatment of diseases, guide to the biological experiments and reveal of the mysteries of life.So predicting the function of proteins by using the bioinformatics means is of greatimportance.After many years’ development, there is a great variety of methods to predict thefunctions of the protein that can be classified into different classes by different ways.According to the homology of the data, they can be divided into homology methods andnon-homology methods. According to the Machine learning techniques, the methods can beclassified into supervised methods, semi-supervised methods and unsupervised methods. Thesupervised methods are usually used when the training data are sufficient. And at that time,the prediction of the protein function is equivalent to the classification problem. SupportVector Machines, Decision Trees and Bayesian are commonly classification methods. Thesemethods and their improvement usually have been used in the field of the prediction of theprotein function.In the field of information fusion, multiple classifiers ensemble is a common method. Itcan improve the system’s performance of suppressing noise and extract more usefulinformation from the data. Traditionally, multiple classifiers ensemble based on single datasource. So when there is only one data source can be used, multiple classifiers ensemble canimprove the classification results greatly if the diversity of the classifiers is remarkable. Butmultiple classifiers ensemble is limited to the characteristic of the single data source. So itcannot significantly improve the classification results and it is easy to achieve the bottleneckof the improvement of the classification results Heterologous data fusion can solve thisproblem to some extent, multi-data sources contain more useful information and there iscomplementary information among the data sources. For example, the data source A containsmuch more information about the protein function X and the data source B contains muchmore information about the protein function Y. Then the fusion of A and B can improve theprediction results of both X and Y.Based on the advantages of the heterologous data fusion, we use the fuzzy integral topredict the function of the proteins. First of all, the base classifier is applied to predict thefunction of the samples from different data sources and output the probability decision value.Then the probabilistic output of the base classifiers will be fused by our Fuzzy integral method. This paper choose supportive vector machine as the base classifier which has betterrobustness. Compared with the weighted average, fuzzy measure can reflects both the weightsof different data sources and the interaction among them. So the fuzzy measure method canachieve a better fusion result than the weighted average method.The fuzzy measure is crucial to the fuzzy integral and it has a great influence on theresult of data fusion. However, the traditional fuzzy measure is difficult to get and it isobvious when there is a large number of objects. So in this paper we use the gλfuzzymeasure to solve this problem. When the number of the data sources to be fused is n, it is justneed to get the fuzzy density of them and then all of the gλfuzzy measure will be calculatedout. The number of fuzzy density is equal to the number of fusion objects. There are manymethods to get the fuzzy density and the global optimization methods can get better results. Inthis paper the fuzzy density is optimized by the Particle swarm optimization algorithm.In this paper, we use eight data sources. For the convenience of compare, they aredivided into A and B. In addition, we use the FunCat (Functional Catalogue) as the functionaltaxonomy and15of the root classes are selected. The experimental results show that ourfuzzy integral fusion methods achieve a better result than the prediction of single data sourceand the prediction results are significantly improved by our heterologous data fusion method.It also get a better result than weighted average fusion, data level fusion of support vectormachine, the single data source based K-nearest neighbor method and multiple classifiersensemble. It also shows that the interactions among the data sources have direct impact on thefuzzy integral fusion effect.
Keywords/Search Tags:Fuzzy Measure, Particle Swarm Optimization, Fuzzy Integral, Data Fusion, ProteinFunction Prediction
PDF Full Text Request
Related items