Font Size: a A A

Prediction Of Nucleosome Positioning Based On DNA Sequences Characters

Posted on:2016-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y SuFull Text:PDF
GTID:2180330461992562Subject:Operational Research and Cybernetics
Abstract/Summary:PDF Full Text Request
Nucleosome is the basic structure of chromatin in eukaryotic cells. Nucleosome positioning plays a key role in the regulation of many biological processes like transcription, replication,DNA repair and RNA splicing. And nucleosome is related to some major diseases like aberrant histone modification, it has great value to study the nucleosome positioning, where nucleosomes are located with respect to the genomic DNA sequence called Nucleosome positioning.Information theory is a branch of probability theory and mathematical statistics. In this paper the informational entropy and the mutual information are applied to detect the information on nucleotide correlation stored in the nucleosomal sequences. The informational entropy shows the uncertainty of the variable in the average sense. If the informational entropy equal to 0 that denotes an invariable appearing. The mutual information measures the correlation of two variables. And high ervalue means better correlation. First, we study the frequency of the dinucleotide in nucleosome and linker DNA sequences, we find that the composition of the two sequence is very different. Based on that, we use the informational enropy and informational entropy to deserve the dinucleotide that separated by a gap of length k. Through the treatment and analysis of a lot of data, we find that the two nucleotides separated by a gap of length 1,2 have a much higher correlation, compared to longer gaps. This finding is used to construct a 32-dimensional feature vector suitable for classifying nucleosomal and linker sequences and we use the Support vector machine and the Receiver operating characteristic to verify the effectiveness. The nucleosome positioning information model achieves high AUCs of 0.9237,0.9068,0.9175, 0.8482 and 0.9079 for Human, Medaka, Nematode, Candida and Yeast, respectively, which have significantly outperformed the previous studies.This paper has the following aspects achievements:First, in this paper the informational entropy and the mutual information are applied to detect the information on nucleotide correlation stored in the nucleosomal sequences. We find that the two nucleotides separated by a gap of length 1,2 have a much higher correlation, compared to longer gaps.Second, based on our finding we construct a 32-dimensional feature vector which including the characteristic of the two nucleotides separated by a gap of length 1,2.Which greatly simplifies the computational complexity and reduce the calculation of large nucleosome dataset.Third, we use the machine learning methods to predict the nucleosome position. Computational experiments on several nucleosome positioning datasets show that in all cases the proposed model gives a better prediction performance than other models. This suggests that our vector contains important signaturs of nucleosome positioning.We construct a 32-dimensional feature vector based on the finding of the informational entropy and the mutual information. But the factors which influence nucleosome positioning are complicated, such as ATP-dependent remodeling, competition and cooperation of the protein molecule,the dependence of the DNA sequence. If we can give a more systematic analysis of the factors to obtain a more comprehensive nucleosome positioning model, combining spatial distribution structure of nucleosome and physicochemical property of DNA sequences, the results will be better predicted. In addition, nucleosome positioning mechanisms for different organisms are not the same, we need to further apply our model into some more complex eukaryotes. Currently, predictions are different derived. Therefore, we need the help of experimental methods to further determine the accuracy of our model.
Keywords/Search Tags:Nucleosome positioning, the informational entropy, Nucleosomal DNA, the mutual information, Linker DNA, Support vector machine(SVM)
PDF Full Text Request
Related items