Font Size: a A A

Theoretical Prediction Of Nucleosome Position And Online Software Development

Posted on:2018-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:C J ZhangFull Text:PDF
GTID:2310330515451784Subject:Biophysics
Abstract/Summary:PDF Full Text Request
The nucleosomes are the basic structural units of chromosomes formed by DNA and histone octamers in eukaryotes.Gene transcription regulation,DNA replication and repair,the formation of high-level spatial structure of DNA within the nucleus and other biological processes are dependent on the distribution and arrangement of nucleosomes.With the rapid development of genome sequencing technology that results in a huge amount of sequence data,how to establish a fast and effective identification of nucleosomes precise epigenetic algorithm become the new challenges of bioinformatics.At present,research teams at domestic and abroad have developed a number of theoretical methods for the prediction of nucleosomes.However,most methods have some shortcomings,such as the existence of redundancy,the incomplete feature of nucleosome extraction,and the lack of friendly free Service interface,etc.,which drives us to develop a new prediction method based on the structural attributes and long-range correlation information of the nucleosome sequence,and makes up for some of the shortcomings of the current approach to nucleosome's localization.In this thesis,we extract features from nucleosome components as well as physical and chemical properties characteristic of nucleosome and ORI sequence,and then construct the model use support vector machine(SVM)and random forest(RF)algorithm,and the performance of the model is evaluated synthetically by using different machine learning algorithms as a contrast.At the same time,we developed the online forecasting service software for the use of scientific research personnel.Finally,we use model analysis the distribution of nucleosomes around the human replication initiation site.We get the nucleosome and ORI sequences from the relevant literature and database,and processing the data through the data screening and de-similarity treatment.Then,we extracted the feature of nucleosome and ORI based on the principle of pseudo-nucleotide component which contains not only the short-range association information of the DNA sequence but also the long-range association information of the sequence.Leaving a cross-validation shows that the accuracy of nucleosome predict model is 87.38% and auROC is 0.933,the accuracy of human ORI prediction model was 75% and auROC is 0.835.In addition,in order to better evaluate the performance of the model,we also compared the different machine learning algorithms,including Decision tree,Naive Bayesian and so on,the results indicate that we build the model in the various evaluation indicators have a certain advantage.We have developed free online service software;researchers can visit the http://lin.uestc.edu.cn/server/iOri-Human.html site to predict the unknown sequence to see whether the sequence is a nucleosome or ORI sequence.Finally,we analyze the distribution of nucleosome near the ORI use our prediction model of nucleosome.The result indicates that the probability of nucleosome formation in the adjacent region of the replication site is lower than that on both sides,and it is proved that the model of nucleosomes predicted in this paper has certain practicability.
Keywords/Search Tags:nucleosome, pseudo nucleotide composition, classification prediction, origin of replication, online service
PDF Full Text Request
Related items