Font Size: a A A

Gene Sequence Network Based On Evolutionary Analysis

Posted on:2017-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2180330485469643Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Large scale of biological data under the background of big data makes a great challenge to traditional life science analysis and experimental methods, and the complex relationship between each data element is difficult to describe and process for traditional models. In order to solve this problem, bioinformatics, a new rising field with the powerful ability of computer provides help and support for the research and development of life science by constructing models and referencing classic algorithms. To develop life science and make more promotion, the inheritance and improvement on traditional methods is necessary and meaningful.Considering the lack of specific functions of data sequences, there is quite an error between the analysis concequence and real observation of experimental data in large scale data. And it is significant that biological data analysis is a non deterministic problem, so it is contradictory and erroneous to use deterministic methods to analysis and calculate. Thus we take non deterministic methods into consideration to increase the accuracy of result and shorten the calculation time to guarantee the effective and efficiency of our method.We earried out two mainly works for these two issues. Firstly we draw lessons from data mining technology in text analysis and natural language processing field, to calculate confidence level probability and analysis conserved sequence patterns by preprocessing the data and constructing the vector library. In next work we use Monte Carlo method to simulate the phylogenetic analysis and choose the best result in large scale simulation tests. Considering the lack of specific sequence function and meaning of data sequences, we introduced a new method that take conserved sequence fragments into account to reduce the calculation from millions of base pairs to hundreds of fragments which can not only reduce the time and space cost but also increase the confidence level and reliability of analysis consequence.The experimental results showed that our method is better than traditional methods in time cost and space cost to a varying degrees, and we add the consideration of specific sequence functions and meanings. The results we generated can fit in with the observed data and provide a illustration based on the analysis of conserved sequence fragments. With the application of Monte Carlo method, our work was closer to the pattern of natural evolution and our results were more comprehensive and completed than traditional methods. We believe our method can make up for the shortcomings of conventional analysis methods and provide a reliable tool to phylogeny and life science.
Keywords/Search Tags:phylogeny, sequence analysis, Monte Carlo method, stochastic model, text mining
PDF Full Text Request
Related items