Font Size: a A A

Research On DNA Sites Prediction Based On The Machine Learning Approach

Posted on:2017-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:G Y ZouFull Text:PDF
GTID:2310330512461364Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the development of biological technology,especially since the Human Genome Project(HGP)was completed.Through gene sequencing method get large amounts data of biological information.Therefore,how to interpret and explore the life information is particularly important,and it is very crucial to explore the model of the attribute?property and function of DNA sequence based on machine learning approach.In this paper,we mainly study the prediction of human gene recombination and DNase I hypersensitive sites(DHSs).Several aspects are studied,including:1)In this paper,several commonly used feature extraction methods are introduced,and the research progress of DNA functional site prediction on machine learning methods has been summarized.Several frequently used machine learning methods are concluded,Such as support vector machines,random forests,deep sparse auto-encoder,hidden Markov model classification,Bayesian classification algorithm.And evaluation indexes of classification algorithms are systematically analyzed.2)Analysis and prediction of the cold and hot spots of human gene sequences.Genetic recombination is important process for the whole life,it can exchange genetic information and promote the evolution of life.In this paper,The approach of using trinucleotide codon encoding amino acid to incorporate the long-range or global sequence order information of DNA,and based on the physical chemistry properties of the amino acid combining with the dinucleotide composition and pseudo amino acid composition to transform the DNA sequence into the discrete DNA model.And using the SVM algorism to establish a predictor,and the jackknife cross-validation test method prove that the effect of our predictor than the existing predictor has a larger increase.In order to improve its practical application value,but also to facilitate the vast majority of experimental scholars,we created the online prediction website.3)Analysis and prediction of DHSs were carried out.DHSs is an important marker of DNA regulatory elements,the enriched regulatory elements involved in a variety of regulatory activities and base modification.So it is very important to locate and analyze the whole-genome DHSs for the analysis of gene transcription regulation functions.In this paper,the relationship between the DHSs and the base is firstly discussed by the analysis of the nucleotide frequency.Then respectively,classification models are constructed by using three kinds of different methods of based on auto-covariance and cross-covariance?the principal component analysis(PCA)and based on the physical chemical properties of the trinucleotide to extract the feature vector of DNA sequence.The validity of the method is proved by the cross validation test method.4)Finally,the research work of this paper is summarized,and the future research work is prospected,including the improvement of the feature extraction method and classification algorithm,the extraction of the structural information of DNA,The further analysis and discussion of other problems of DNA function,and so on.
Keywords/Search Tags:recombination hotspots and coldspots, DNase ? hypersensitive sites, deep sparse auto-encoder, cross validation, the principal component analysis
PDF Full Text Request
Related items