Font Size: a A A

An Application Research Of SVM And ESOM Of Biology Sequence Comparison And Prediction

Posted on:2011-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhaoFull Text:PDF
GTID:2120360305466930Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In the new century, with the rapid comprehensive development of biological theory, the bioinformatics based on biological sequence comparison has been received a great deal of attention and has entered a rapid development period. It becomes more convenient to process the mass of biological data because of the rapid development of biological sequence comparison, and also it is very favorable to exploring and revealing the inner information of all of life.In the past it was mainly based on traditional statistical theory about the research of the biological sequence, there was very random and uncertainty in the process of the massive data and it was quiet obvious influenced by the human factors. As it was mainly used the linear approach in analysis of biological data by traditional statistical, the time of process the data is too long and the efficiency is also low. So, it is a useful attempt to adopt the SVM (Support Vector Machine) and ESOM (Emergent Self-Organizing Map) with good performance of data analysis for the studying of biological sequence comparison.SVM has a good performance for high-dimensional data classification and a strong ability of generalization. Neural network has the features of associative memory, optimization computation, knowledge processing, classification, identification and nonlinear mapping. ESOM is happened based on the idea of emergence in natural world. The output layer of ESOM is composed of a large number of neurons, and it can map the clustering result into a boundless toroid grid, which forms an intuitive and visualization graphical. By using SVM and ESOM for gene sequence comparison can solve the problems occurred in traditional methods which the classification result is not intuitive, the competition and cooperation of neurons are inadequate.In this paper, SVM and ESOM are combined for gene sequence comparison. The identification model is established in the research and experiment process. Firstly, the gene sequence data are pre-processed, which means that they will be assigned some real numbers. Then, the gene sequence data are normalized, and we make cross-validation on the normalized sequence data in order to achieve feature extraction. Secondly, a reasonable network model is built by the ESOM network, which includes constructing reasonable output layer, determining training mode, initializing weight vectors, selecting neighborhood radius and decay strategies of learning rate. Finally, the sampling data are used for training the ESOM network model to get the clustering results. The results are visualized in the graphical form. Then the testing data are imported into the model for classification and prediction to achieve the final results. Since the paper uses classification method in two times, so the precision is relatively high. According to the results, the model proposed in this thesis has a good performance, and the reliability of the model for problem-solving is high. Therefore, ESOM has a good performance for the future research and application.
Keywords/Search Tags:Sequence Comparison, Support Vector Machine, Emergent Self-Organizing Map, Clustering Analysis, Visualization
PDF Full Text Request
Related items