Font Size: a A A

Research On Improved RNA Secondary Structure Prediction Based On Transformer Model State Inference

Posted on:2022-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:D W ZhangFull Text:PDF
GTID:2480306332965379Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The multiple functions of RNA are determined by its basic structure.A comprehensive and accurate study of the basic structure of RNA can help to understand the role of RNA in molecular biology and facilitate the development of the field of RNA biotechnology.Determining the base pairing status of RNA is the first step in making RNA secondary structure prediction.In recent years,using high-throughput detection data to improve the accuracy of thermodynamic model prediction is a hot research topic in RNA structure determination.Among them,the structural information obtained using the primer extension selective 2?-hydroxy acylation(SHAPE)experiment has greatly improved the accuracy of the RNA secondary structure prediction model.This experiment using soft constraints to integrate detection data with a thermodynamic-based RNA secondary structure prediction algorithms is equivalent to adding a suitable pseudo-energy term to the standard energy model of RNA secondary structure.The problem of determining which nucleotides in an RNA sequence are paired or unpaired in the RNA secondary structure by different machine learning techniques is called RNA state inference in this paper.A successful RNA sequence state inference algorithm can help data oriented minimum free energy model provide auxiliary information for predicting RNA secondary structure.These data can be integrated into RNA secondary structure prediction by a minimum free energy model,which is called a predictive state-oriented minimum free energy model(NNTM).Converting these state predictions into synthetic shape data(Shape-Data)for guiding NNTM can substantially improve the accuracy of RNA secondary structure prediction,thus providing assistance in studying higher-level RNA structures,improving RNA structure analysis algorithms,and analyzing RNA-RNA interactions.This paper focuses on the latest research direction on RNA secondary structure prediction in bioinformatics: using Shape-Data data as a soft constraint of NNTM to predict RAN secondary structure.In this paper,we analyzed three main experimental basis,1.RNA state inference can be performed using deep learning modes;2.The conversion between state inference and Shape-Data can be performer by formulas;3.The better the Shape-Data data fitting results,the more accurate the prediction of RNA secondary structure for NNTM.the analysis of the above three studies and experiments in this paper,the Transformer model can achieve excellent performance for state inference in the RNAStralign data set.The experiments in this paper use the Transformer model to perform state inference on the RNAStralign data set,and evaluate the model's performance in RNA state inference through three indicators: ACC,PPV,and SEN.All three indicators have more than90% good results.These results show that the multi-head attention mechanism of the Transformer model can well learn the relationship between RNA sequences on the RNAStralign dataset.In particular,the RNAStralign dataset contains eight families of RNA sequences and the long sequence in the sequence can contain thousands of nucleotides,and the short one can even be less than one hundred nucleotides.It can be used in such a distributed dataset of RNA sequences.Achieving such performance can fully prove that using the Transformer model for state inference is a correct choice.And also prove that finding a better model to predict Shape-Data data as a soft constraint to predict RNA secondary structure in the NNTM model is a very feasible research direction.
Keywords/Search Tags:RNA secondary structure, SHAPE, NNTM, transformer, LSTM, soft constrain
PDF Full Text Request
Related items