Font Size: a A A

Identification And Prediction Of TracrRNA Based On Component Characteristics

Posted on:2020-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y W FangFull Text:PDF
GTID:2370330596475254Subject:Biophysics
Abstract/Summary:PDF Full Text Request
The CRISPR-Cas system,an RNA-mediated adaptive immune system for bacteria and archaea,can specifically cleave exogenious nucleic acid sequences and has been developed to be the most widely used gene editing tool.Some subtypes of type II CRISPR-Cas(eg,A,B,C)rely on trans-activating CRISPR RNA(tracrRNA)to interfere with the invasion sequence and to mature the pre-crRNA.After activated by tracrRNA and crRNA complexes and been treatment with RNase III,the CRISPR-related endonuclease Cas9(Csn1)cleaves site-specific homologous target DNA.Therefore,the recognition of tracrRNA plays an important role in the research and development of the genome editing tools of the new CRISPR-Cas system.In this paper,54 known tracrRNA were collected as positive training dataset,and the known tracrRNAs were randomly shuffled to construct a "fake tracrRNA" dataset,The "fake tracrRNA" dataset was the negative training set with structural features and the same amino acid composition as known tracrRNA.The original training set was characterized by the pseudo-nucleotide component PseKNC method,and the obtained feature set was used to train the classifier.The classifier is constructed using machine learning methods.During the training process,the leave-one test is used to perform cross-validation to evaluate the performance of the classifier.The feature selection technique based on variance analysis is used to optimize the features,and the unrelated and redundant features were removed during the model construction process.Finally,the tracrRNA classifier with the smallest feature number and the best performance based on the optimal PseKNC parameters was obtained.When using support vector machine and other machine learning algorithms such as naive Bayes and random forest,the prediction performance of SVM in training model is obviously better than other methods.Next,the classier was optimized by feature selection screening based on the support vector machine.When the PseKNC parameters k,j and w were 5,1,and 0.5 respectively,and the feature number was 171,the trained tracrRNA classifier has the best predictive performance.The sensitivity was 98.15%,the specificity was 100%,the accuracy was 99.07%,the MCC was 98.16%,and the area under the ROC curve was 0.998.The results indicated that the classifier has a very good ability to distinguish tracrRNA and "fake tracrRNA" which has tracrRNA structural features and amino acid composition.The classifier provides a powerful auxiliary method for identifying new tracrRNA,designing and optimizing tracrRNA during the experiment.
Keywords/Search Tags:CRISPR-Cas system, TracrRNA, Machine learning, Identification, Prediction
PDF Full Text Request
Related items