Font Size: a A A

Research On RNA Secondary Structure Prediction Based On U-net Convolutional Neural Networks

Posted on:2020-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:H YaoFull Text:PDF
GTID:2370330599961753Subject:Theoretical Physics
Abstract/Summary:PDF Full Text Request
Non-coding RNAs cannot encode transcription into proteins,but play an indispensable role in localization,replication,translation,degradation,regulation,and stability of biological macromolecules,which are often closely related to the structure of non-coding RNAs.At present,the experimental method for measuring RNA structure is difficult and expensive.Therefore,more and more researchers use computational simulation to study RNA structure.RNA secondary structure prediction is the basis of tertiary structure research,for genome research and drug design.Etc.also has a great effect.Nowadays,with the extensive application of machine learning and deep learning in the fields of artificial intelligence,computer vision,image processing,text processing and speech recognition,many researchers have also begun to use neural networks for RNA secondary structure prediction research.In this paper,the U-net convolutional neural network which has achieved good results in image segmentation is improved.The focal loss function is introduced as the loss function of neural network,which effectively solves the sample imbalance problem of the positive and negative effects in RNA secondary structure prediction research.The training set used in this paper is from the RNAstand database and contains 1128 sequences of less than 500 nt in length.The test set is from the PDB database and contains 84 sequences of less than 500 nt in length after similarity.The neural network structure and data set are kept unchanged.Based on the physical and chemical properties of RNA sequences,the PC-Unet model is proposed.The average PPV is 0.654,the STY is 0.667,and the MCC is 0.647.Based on the characteristics of direct coupling analysis,The DCA-Unet model has an average PPV of 0.811,a STY of 0.654,and an MCC of 0.699.Based on the multi-sequence alignment feature,the MSA-Unet model is proposed.The average PPV is 0.803,the STY is 0.722,and the MCC is 0.742.In this paper,the two features of multi-sequence alignment and direct coupling analysis are combined to propose a DCA+MSA-Unet model with an average PPV of 0.779,a STY of 0.731,and an MCC of 0.743,which is not improved compared with the single feature model.Instead,it has declined due to noise.Therefore,a new method of combining the three models according to different weights is proposed.The best result of the combination of DCA-Unet model and MSA-Unet model is PPV of 0.834,STY of 0.655,MCC of 0.709,The best result of the combination of PC-Unet model and MSA-Unet model is PPV of 0.838,STY of 0.669 and MCC of 0.726.The combination of PC-Unet model and DCA-Unet model has the best effect.The best result is PPV of 0.853 and STY is 0.628,MCC is 0.697,which is better than the existing methods.
Keywords/Search Tags:secondary structure prediction of RNA, convolutional neural network, direct coupling analysis, multiple sequence alignment, physical and chemical properties
PDF Full Text Request
Related items