Font Size: a A A

Nucleosome Positioning By Sequence Information And Deep Learning

Posted on:2022-07-07Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiFull Text:PDF
GTID:2480306737453324Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
As the basic structural unit of chromatin in eukaryotes,nucleosomes not only compress the structure of chromatin,also kick in a key role in genome expression,DNA replication,repair,and other life processes.Therefore,it is of far-reaching biological significance to study the precise positioning of nucleosomes in the genome-wide DNA sequence.With the continuous advancement of biotechnology and computer technology,biological data is exploding.Relying only on biochemical experimental methods to study the nucleosome positioning is costly and time-consuming.That is why the development of efficient and accurate nucleosome positioning has become a more significant research need.In this thesis,we proposed several new nucleosome positioning models based on sequence image representation and word vector representation of DNA sequences,respectively,using machine learning and deep learning algorithms,and the effectiveness of our method were validated on relevant benchmark datasets.Firstly,utilizing frequency chaos game representation(FCGR)and graphically to represent DNA sequences.Then,the nucleosome positioning model based on DNA sequence image representation was proposed using support vector machine,extreme learning machine and convolutional neural network,respectively.The classification prediction accuracy of the model under 10-fold cross-validation was calculated on H.sapiens,C.elegans,D.melanogaster and S.cerevisiae datasets.The results show that it is feasible for FCGR feature to be applied to nucleosome positioning,and a sequence is better represented by combining several FCGR features with different dimensions.Among them,the highest classification accuracy rate reaches 87.08%,87.54%,81.13%,100%.Secondly,word vectors of DNA sequences were trained based on k-mer and word2 vec models,and three deep learning models with different network structures were constructed.Experimental results under 10-fold cross-validation shows that the NP?CBiR model,which integrates convolutional neural network,bidirectional GRU and bidirectional long shortterm memory neural network has better prediction performance.It may be due to combine the advantages of different network structures in feature extraction,and can effectively obtain the local features and base order features of DNA sequences.Comparing with other research methods,the NP?CBiR model achieved the highest classification accuracy of86.18%,89.39% 85.55% and 100% on four datasets of H.sapiens,C.elegans,D.melanogaster and S.cerevisiae,respectively;and effectively lifted the AUC values on eight additional sequence data sets in H.sapiens,D.melanogaster and S.cerevisiae species.These results demonstrated that DNA sequence word vectors can effectively represent sequence features.
Keywords/Search Tags:Nucleosome positioning, Frequency chaos game representation, Word vector, Machine learning, Deep learning
PDF Full Text Request
Related items