Font Size: a A A

Research On PiRNA And Promoter Based On Sequence Information

Posted on:2019-12-01Degree:MasterType:Thesis
Country:ChinaCandidate:F YangFull Text:PDF
GTID:2370330566498676Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advent of the postgenomic age and the rapid development of gene sequencing technology,more and more biomolecules have been obtained.Due to the different in sequence length and arrangement order of basic units,the sequence composition,molecular structures and the physicochemical properties are different,that lead to the differences in functions and types in biomolecules.The complexity of the order of the basic units in biomolecules also makes it difficult to analyze biomolecules based on traditional biological experiments.With the development of machine learning,these biological molecules can be quickly analyzed and identified by extracting features from biological sequences and combining with machine learning methods.Therefore,based on the sequence information of known biomolecules,it is gradually becoming one of the most important research tasks in bioinformatics to analyze its types or functions,structures and functions.In this context,a variety of feature extraction methods are used to combine machine learning methods to explore the types and functions of biomolecules.Piwi-interacting RNAs functioned as maintaining the stability of germ cell and enhancer is an important element for gene regulation.In this regard,we have applied multiple feature extraction methods combined with machine learning algorithms to the research of piwiinteracting RNAs and promoters.The content of this thesis includes the following parts:Firstly,a method to identifying piRNAs and their functions based on sequence information.Traditionally,the method of piRNA identification mainly through a combination of nucleotide composition and transposon information,which have achieved good results.In this chapter,Pse KNC,a feature extraction method that achieves better performance in a variety of molecular identification tasks,is applied to the task of piRNA identification.The features extracted by this method not only contains the sequence composition information,but also contains the physicochemical properties.By this method,the sequence information of piRNA molecules can be better described.Through comparison experiments,it is found that the proposed method is superior to the existing methods based on sequence information to identify piRNA.Meanwhile,on the basis of this problem,the two-layer predictor was constructed to identify the function of piRNA molecules.This predictor has achieved good prediction performance both in piRNA identification and their function identification.Secondly,a method to identifying promoter and their types based on sequence information.In the aspect of promoter identification,the method primarily through the RNA polymerase binding site on the promoter sequence to predict promoter.In order to better describe this feature,the promoter sequence was divided by sliding window,and each subsequence of a sequence is extracted by Pse KNC.The method used in this chapter combines the sequence characteristics of the promoter and the advantages of the Pse KNC feature extraction method to better characterizes promoter sequence.In addition,a two-layer structure model is designed,and the problem promoter type identification is treat as a multiclassification problem.This model has achieved good performance.Thirdly,a method to identifying promoter and their types based on multiscale window.According to the local conservation of the promoter sequence and the validity of the feature extraction by the sliding window.On the basis of the above method,we improved the method of promoter identification.The local conservatism of the sequence is introduced to segment the promoter sequence adaptively.The feature of the promoter was extracted from these segmented sequences.Through experiments,it is found that the proposed model can improve the accuracy of identifying promoter.
Keywords/Search Tags:piRNA identification, promoter identification, support vector machine, random forest, gradient boosting decision tree, ensemble learning
PDF Full Text Request
Related items